-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby scanning job hangs forever and doesn't complete on Ubuntu-latest #12349
Comments
@jedrekdomanski Thanks for reporting. It looks like your repository somehow runs into a performance issue with one of the queries. Did things work for you in the past or did you just setup CodeQL analysis for your repository. If things used to work, could you try running a previous version of CodeQL as a workaround. This can be done by setting the Could you try : - name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
env:
CODEQL_ACTION_EXTRA_OPTIONS: '{ "database": { "analyze": ["--timeout", "600"] } }' This should limit runs of If this is an open source repository, could you share the URL and any debug artifact so we can investigate. If this is a closed source repository, please contact GitHub support to continue this conversation via an internal support ticket. |
Thank you for your quick reply. It was never successful before, we've only just started running the scannig jobs in the project. I'll try your suggestions. In the meantime, here's the full log of the job which I ran in debug mode. |
Thanks for the quick reply, unfortunately the URL you posted had expired before I could download it. |
I've attached the logs of a failed job below. |
Here is another output file of a job that's just failed. |
Looking at the output of the "resolve files" command, it seems like your repository is quite large. Most likely CodeQL is running low on memory due to the size of the repository which causes it to slow down. You could try running the analysis on a larger runner or a self-hosted one: Using larger runners. Another thing to try is to reduce the number of scanned files. The If reducing the files and increasing the RAM does not work then it would be helpful to do the following:
|
@aibaars I've tried reducing the number of scanned files but this doesn't work. The documentation says to add this:
And so says the example config file here https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning#example-configuration-files
So I added this to the root namespace in my config file:
but I get an error
|
I think you need to put the configuration in a separate file and refer to it using the - name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
config-file: ./.github/codeql/codeql-config.yml
env:
CODEQL_ACTION_EXTRA_OPTIONS: '{ "database": { "analyze": ["--timeout", "600"] } }' |
So I am surprised... that I do what the docs say and it doesn't work. |
I'm sorry, the yaml snippet I included is wrong. The
|
Is this a correct configuration?
config file
|
Anyway, it doesn't seem to improve the runtime at all and
doesn't look like it has any effect.
|
My bad, I thought the codeql-action ran the Could you try:
|
The
|
Unfortunately, it doesn't work
My config looks like this:
|
I think that configuration looks good. Could you also set the following globally (near the top of the workflow):
That should print a line for each file that is scanned. |
Sorry, it should include the hostname too of course: |
Thank you. Here's what I see now:
and
Full logs. |
Ok, so even when only scanning app and lib the analysis still fails? Have you tried with a runner with more RAM? There are a couple more things to try to make CodeQL run with a single thread which may require less RAM. Add the following to the top of the workflow:
Add
and upload If you (re)run the workflow in debug mode it also uploads a debug artifact. This can be used for diagnosing problems. Note that it contain a copy of the scanned source code, so do not attach it to this public issue . You can attach parts of it of course, just be careful not to leak information you like to keep private. |
Yes, despite limiting the directories to scan to app and lib it still fails. I don't know how to use a runner with more RAM. I didn't find any documentation on how to do that. We don't have our own runners. Is it possible to increase RAM? I added the code you suggested but it still failed with the same error (timeout after ~ 16 minutes). Error log:
|
I added the step upload artifact but it doesn't work.
I didn't know how to do that so I found this documentation https://github.com/actions/upload-artifact#upload-an-individual-file but it doesn't work |
I think you got the indentation wrong: - name: Step 3 - Use the Upload Artifact GitHub Action
uses: actions/upload-artifact
with:
name: my-artifacts
path: ${{ runner.temp }}/evaluator.log |
Or perhaps the problem is that you forgot the |
See Using larger runners for information. You can also try on a local machine (Linux, Windows, or OSX) :
|
Try adding |
It still fails
|
That's indeed strange, I get the same error. Try with |
Counting only files in
that's 135874 in total. |
Thanks, we'll have a look at the log. |
FYI you can get a more readable log by running
|
By default steps do not run if a previous step has failed. However, you can change this by adding
|
If you have time, could you add
After these changes, trigger the workflow and cancel the run manually after a few minutes, to test that the log is indeed getting uploaded. If that works, then re-run the job in debug mode. It should stop after roughly 2 hours and upload a log file. |
It failed with
I see the command that was used to run codeql was:
and one of the options used to run it was |
That should indeed have been |
Memory is shared by the different threads, so if memory is getting low the 2 threads may be competing for resources and make things even worse. On the other hand |
We progressed a bit further and reached as far as
When I reduce the number of files to scan only to |
Thanks for the update. Could you attach the evaluator log of the 2 hour run?
That's nearly done. Could you retry with a timeout of 5 hours? It would be really great to get an evaluation log of a completed run and it looks like that may be possible.
That is really good to know. The RAM, CPU allocations should be the same, the spec of the Actions VM hasn't changed. Let's try with the CodeQL version of September: https://github.com/github/codeql-action/releases/tag/codeql-bundle-20220923 by setting We do performance tests on over 2000 repositories for each release, but perhaps your code base has some code patterns that confuse the analyzer for some reason. Do you know if any of the files in the |
No, we don't have such code in our repo :) I'll try to run it for 5 hours again and send you the logs. |
@aibaars Do you have any update on this? |
A colleague of mine just informed me
The logs suggest that there are quite a lot of end-points in your application. Does that sound right? As you can imagine, it is quite hard to create an example database to reproduce the same problems as you are experiencing on the real one. |
We have 54 API endpoints in our app. How much is "a lot" for you and how much can CodeQL handle? WIll increasing the runner solve our problem? |
That is not really a lot. I would consider hundreds or thousands of end-points "a lot". A large runner should work better, and may be able to complete the analysis. However, I suspect there is something in your code base that somehow "confuses" CodeQL, so I don't expect great performance even with a large runner. Still worth a try though. Have you had a chance to try with an old version of CodeQL, for example the September version?
|
I've just tried it and it completed within less than 3 minutes on our latest code base on both We are at version 0.4.0 now. |
This is the last version that works (October 10th) https://github.com/github/codeql-action/releases/download/codeql-bundle-20221010/codeql-bundle-linux64.tar.gz |
@jedrekdomanski Sorry that this is taking so long. Unfortunately, we have not been able to reproduce the issue you are experiencing. The relevant change in the October 24th version is likely the improvements to the call graph (matching method calls with method definitions). However, the call graph computation itself is not slow in the log file, so possibly there is a problem that was unreachable before, but became reachable due to the changes in the call graph. The most effective way to continue the investigation would be to have a copy of the CodeQL database. Would it be possible for you to share that with GitHub engineers? Note, that a CodeQL database contains a copy of the analyzed source code, so:
|
It's not possible for us to share our source code, our policy doesn't allow this. The fact, that you were not able to reproduce the problem tells me that you were not using the exact same environment/database/image, etc. as on GitHub. The problem does exist but for some reason not for you. I don't have the visibility of how all this stuff works on GitHub so I might be wrong but why should we involve GH engineers into this? Can you run it on the same image as it currently runs on GH? |
Makes sense, I expected that would be the case.
I don't think the problem is related to the image we are running. Standard GitHub Actions runners, the larger ones, and even your local machine all showed a slowdown when running a newer version of CodeQL to analyze your source code. We regularly test CodeQL against a few thousand open source repositories and did not see a significant slowdown. Therefore, there must be something specific in your code base that triggers a rare bug/corner case in our analysis. We would really like to find out the cause of the problem, but debugging is going to be hard without access to the source code. One thing we could so is provide you with a number of CodeQL queries to get some statistics about your code base (for example number of functions, size of the call graph, size of control flow graph, etc). These queries can be run with the |
Yes, I can try it. Do you know what changed in the version that stopped working for us? Our code base did not change on the day you introduced the new version of CodeQL which stopped working for us so I think we should focus on looking into the differences in the versions and figure out what's causing the slowdown. What's interesting, though, is that this project is not the only one in our organisation and the latest version of CodeQL works fine for other projects. |
Glad to hear that things work fine for other projects! The changes between the versions should be more or less codeql-cli-2.11.1...codeql-cli-2.11.2 . Unfortunately, quite a lot was changed in that period. |
Great! The next step would be to switch to a private communication channel instead of this public issue ticket to work together on this problem. I think the easiest options are
I think a private repository would be the best way to collaborate and exchange files. It would be best if the repository is created by you or your organization. The data we'll be exchanging is yours, and it is best if you control who has access to it. If you can't create a private repository for some reason then I'd be happy to create one for the collaboration. If you don't want to use a repository, then please create a support ticket and mention this public issue and ask the support team to route the issue to me ( |
@aibaars
The command line I use is Then I upgraded codeql to 2.12.5, recreated the database and executed the query, still stuck, same as above, but the progress bar went to the last f I can share js code or database, and ql file, how to send it to you |
@jedrekdomanski My colleague @asgerf implemented a query to help diagnose performance problems: #12689 Would you be able to run this query? Get a copy of CodeQL from https://github.com/github/codeql-cli-binaries/releases and run the following commands:
Note that the query only extracts some numbers; it does not leak method or file names, so it should be safe to attach the results to this public issue ticket. |
Hello,
We have set up a CodeQL code scannig job in our Ruby project and it takes over 6 hours to run and never completes. I have tried using both the default queries as well as security-extended and security-and-quality but they hang forever and never complete. We run two jobs (for Ruby and Javascript) using a language matrix. This is our codeql-analysis.yml file. Currently the timeout-minutes is set to 25 but it is only so to limit the run time and cut the cost of the job because we pay for it but it never completes. It was set to 6 hours but it didn't complete either.
Here is some logs, as you can see it just seats there and does not progress at all.
The text was updated successfully, but these errors were encountered: