Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does C++ extractor support to process code with unity build? #14479

Open
nautaa opened this issue Oct 12, 2023 · 15 comments
Open

Does C++ extractor support to process code with unity build? #14479

nautaa opened this issue Oct 12, 2023 · 15 comments
Labels
acknowledged GitHub staff acknowledges this issue question Further information is requested

Comments

@nautaa
Copy link

nautaa commented Oct 12, 2023

Description of the issue
Does C++ extractor support to process code with unity build? I recently tried to use codeql in a c++ project with unity build, but many file extractions failed.
unity build: https://en.wikipedia.org/wiki/Unity_build

@nautaa nautaa added the question Further information is requested label Oct 12, 2023
@MathiasVP
Copy link
Contributor

Hi @nautaa,

Yes, this should be supported just fine. Is this project open source so that we can try to reproduce the extraction failure?

@MathiasVP MathiasVP added the acknowledged GitHub staff acknowledges this issue label Oct 12, 2023
@nautaa
Copy link
Author

nautaa commented Oct 12, 2023

@MathiasVP yep, i try CodeQL in github action, the full log result is here: https://github.com/nautaa/oceanbase/actions/runs/6479648696

Some of the logs are as follows:

[E 09:39:37 10034] Warning[extractor-c++]: In construct_message: "/home/runner/work/oceanbase/oceanbase/deps/oblib/src/lib/restore/ob_storage_info.cpp", line 96: error: user-defined literal operator not found
      LOG_WARN("storage info init twice", K(ret));

@MathiasVP
Copy link
Contributor

MathiasVP commented Oct 12, 2023

Thank you! Looking at the log output we do produce a database containing >3000 files. So it does seem like things are working as expected 🎉.

Your observation about some of those error messages in the log is correct. It does seem like there are certain constructs that we fail to extract. I'll create an internal issue to track this. In the meantime, would it be possible for you to rerun the workflow with debugging enabled? That will probably speed up our debugging process tremendously!

You can find information about how to rerun with debugging enabled here: https://github.blog/changelog/2022-08-01-debugging-codeql-analysis-in-code-scanning-made-easier-by-obtaining-detailed-logs-and-debugging-artifacts-from-the-codeql-action/

@nautaa
Copy link
Author

nautaa commented Oct 12, 2023

@MathiasVP I have re-run this action. Thanks.

@MathiasVP
Copy link
Contributor

Thank you 🙇. We'll keep an eye on the build and see if we can fix the errors reported during extraction.

I'd like to stress that, while there are errors reported in the log it does seem like CodeQL is running correctly on your code. So the errors probably means that even though there are certain expressions / functions / file that are not included available to the analysis, the analysis still runs perfectly fine.

@nautaa
Copy link
Author

nautaa commented Oct 12, 2023

Thank you 🙇. We'll keep an eye on the build and see if we can fix the errors reported during extraction.

I'd like to stress that, while there are errors reported in the log it does seem like CodeQL is running correctly on your code. So the errors probably means that even though there are certain expressions / functions / file that are not included available to the analysis, the analysis still runs perfectly fine.

yep, CodeQL runs perfectly fine in files that are included, but the problem is that some files are not been included.

@nautaa
Copy link
Author

nautaa commented Nov 9, 2023

@MathiasVP Is there any progress on this issue?

@MathiasVP
Copy link
Contributor

Hi @nautaa

A small update: Since you opened the issue we've updated our frontend which is expected to have fixed a large number issues. It may be the case that this has fixed your issue, but we won't know until next week since the new version of CodeQL hasn't yet been rolled out.

It's expected to start rolling out on 13 Nov. Would you mind checking if the issue still persists on Wednesday (i.e., 14 Nov) your time by running the workflow with debugging enabled? That should hopefully ensure that you'll be testing on the right version (i.e., CodeQL version 2.15.2)

@nautaa
Copy link
Author

nautaa commented Nov 15, 2023

@MathiasVP Hi, I recently re-ran version 2.15.2 of CodeQL, but unfortunately the problem doesn't seem to be resolved. Even fewer files were scanned. There are still a lot of extraction warnings. Can you have a look again?

2.15.1 results: https://github.com/nautaa/oceanbase/actions/runs/6844298579/job/18652394580

CodeQL scanned 602 out of 4647 C files, 107 out of 4480 C++ files, 0 out of 105 Python files and 0 out of 52 Go files in this job.

2.15.2 results: https://github.com/nautaa/oceanbase/actions/runs/6844298579/job/18689398226

CodeQL scanned 233 out of 4647 C files, 35 out of 4480 C++ files, 0 out of 105 Python files and 0 out of 52 Go files in this job.

@jketema
Copy link
Contributor

jketema commented Nov 15, 2023

Hi @nautaa,

The particular warning you mentioned in #14479 (comment) should be resolved in CodeQL 2.15.3. More errors remain though.

Note that the page you're quoting in #14479 (comment) is not really representative for what is going on. All files are actually scanned. However, only the mentioned number was scanned completely without parse errors. I would expect most code to be represented in the database we build during analysis.

@nautaa
Copy link
Author

nautaa commented Nov 16, 2023

@jketema , Thanks for your reply. As you said, most of the code has been scanned. But I see that the lines of code scanned in the logs are also a small portion of the entire code base. As far as I know the entire codebase has about 3 million lines of code, but only 300,000 lines show up in the logs?

|                  Metric                   | Value  |
+-------------------------------------------+--------+
| Total lines of C/C++ code in the database | 388544 |

Other than that, I'd like to ask if codeql won't support the more errors you mentioned that show up during the scan?

@jketema
Copy link
Contributor

jketema commented Nov 16, 2023

@jketema , Thanks for your reply. As you said, most of the code has been scanned. But I see that the lines of code scanned in the logs are also a small portion of the entire code base. As far as I know the entire codebase has about 3 million lines of code, but only 300,000 lines show up in the logs?

I'm not sure what to make of that number, and I'm trying to get some internal clarity on those. Does your build compile all 3 million files of code, or does it compile less?

Other than that, I'd like to ask if codeql won't support the more errors you mentioned that show up during the scan?

I've opened an internal issue to solve them, but this will take time. The problem is that the oceanbase code uses some quite esoteric C/C++ features, that sometimes only work by the grace of certain warnings being disabled.

@nautaa
Copy link
Author

nautaa commented Nov 16, 2023

@jketema

I'm not sure what to make of that number, and I'm trying to get some internal clarity on those. Does your build compile all 3 million files of code, or does it compile less?

The vast majority of the 3 million lines were compiled.

I've opened an internal issue to solve them, but this will take time. The problem is that the oceanbase code uses some quite esoteric C/C++ features, that sometimes only work by the grace of certain warnings being disabled.

I see, it may not be appropriate to use CodeQL on the OceanBase codebase at current time.

@jketema
Copy link
Contributor

jketema commented Nov 16, 2023

I see, it may not be appropriate to use CodeQL on the OceanBase codebase at current time.

I think that is an incorrect conclusion, because as I stated above, a lot more code is being analysed than the number suggest.

@nautaa
Copy link
Author

nautaa commented Nov 16, 2023

I see, it may not be appropriate to use CodeQL on the OceanBase codebase at current time.

I think that is an incorrect conclusion, because as I stated above, a lot more code is being analysed than the number suggest.

Ok, I see. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged GitHub staff acknowledges this issue question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants