-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to extract source files when using a special compiler (e.g. TMS320C2000 C/C++ Compiler)? #8453
Comments
The problem is that CodeQL does not recognize The configuration file
The rule above informs CodeQL that is should run the C/C++ "extractor" whenever a binary named You could copy this file and try to add a rule for
Note that compiler specifications are an advanced feature. Adding a rule will cause the CodeQL tracer to intercept |
Hi @aibaars, thank you for replying. I modified How would it happen? Did I make anything wrong when defining the compiler rule? Can you help me with that? |
@li-xin-yi It looks like CodeQL is now intercepting the compiler calls and running the "extractor" on the source files. The "extractor" copies the analysed source file to the "source archive" (this bit is working) and parses the source file to produce "trap" files (trap files are later imported into the databases). Most likely, the extractor fails to parse the source files, causing the trap files to be incomplete or even empty. In the screenshot you attached I see the As I said earlier:
It would be good to know what causes the failures. It could be something simple like a Could you please file a separate feature request issue for CodeQL support for the TMS320C2000 C/C++ Compiler? If you have an enterprise account, it's probably best to file the feature request through enterprise support. We keep track of all feature requests, but when/if a feature gets implemented depends on priorities. In the short term you might want to try the following:
The |
Thank you @aibaars, I looked into
It considers the include files for the compiler as code of some syntax errors, however, it can actually be compiled by By the way, I really like CodeQL and applied it on other high-level development projects. I just want to extend the usage to some works involving low-level embedded systems as well. |
The One way to make this particular error disappear is to redefine the
would do the trick. Otherwise, it should be possible to let a Another thing to try would be to add. That should make the extractor carry on regardless of any errors. The drawback is that your database may have many "gaps".
This post seems related : |
I'm trying to get the same thing working with the TMS320C2000 C/C++ Compiler but not having any luck using codeql v2.15.1. Have things moved on since this post? I don't see fwiw i'm attempting to get the extractor working on linux. It would be wonderful if someone could guide me on how to get codeql to successfully parse and extract using this compiler 😍 I've tried following this thread #10132 (comment) and it seems to trip up when there's command line arguments on the intercepted c2000 compiler command:
|
The "compiler specification" files have been superseded by tracing configuration Lua scripts. The flag is This is an example of a custom tracing script used for C# to inject some custom flag into an Another example of a tracing specification: https://github.com/github/codeql/blob/main/go/codeql-tools/tracing-config.lua I don't think there is any public documentation at the moment on how to write extra tracing configuration files. You can use the following template as a starting point, save at as for example function GetCompatibleVersions() return {'1.0.0'} end
function RegisterExtraConfig()
return {
['cpp'] = {
DEFINE_MATCHERS_HERE,
table.unpack(_RegisteredMatchers['cpp']) -- include default matchers, if needed
}
}
end You can run You could make a tracing configuration file that intercepts |
Hi @aibaars thanks for the detailed reply. I can intercept cl2000 commands with a custom tracing lua script, albeit by intercepting /bin/sh pattern as that's the binary that the interceptor is seeing when I run make all. The next hurdle is getting the extractor to do the right thing and pass arguments along in the right way. Because I couldn't find the extractor source code (and there isn't any help documentation on the binary), i've tried to tinker with sending commands directly to the extractor. For example: Building a single source file with the custom cl2000 compiler requires several include paths for my domain: /opt/ti/ccs/tools/compiler/c2000_x/bin/cl2000 -v28 --include_path/opt/ti/ccs/tools/compiler/c2000_x/include --include_path=/src --include_path=/opt/ti/ccs/bios/packages/ti/bios/include --include_path=/opt/ti/ccs/bios_x/packages/ti/rtdx/include/c2000 --include_path=/opt/ti/ccs/xdais ./custom.c Trying to translate this command to something the exctractor can process correctly, i've noticed that there's a /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic /opt/ti/ccs/tools/compiler/c2000/bin/cl2000 -c './custom.c' --compiler -I/opt/ti/ccs/tools/compiler/c2000/include --compiler -I/src --compiler -I/opt/ti/ccs/bios/packages/ti/bios/include --compiler -I/opt/ti/ccs/bios/packages/ti/rtdx/include/c2000 --compiler -I/opt/ti/ccs/xdais with this command, the extractor attempts the following: [E 06:33:32 5840] Processed command line: /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic_config compiler_mimic_cache/fdf7fe580b4c -w --error_limit 1000 --disable_system_macros --variadic_macros --c89 --ti '-D__signed_chars__=1 /* Predefined */' '-D__DATE__="Nov 14 2023" /* Predefined */' '-D__TIME__="06:32:09" /* Predefined */' '-D__edg_front_end__=1 /* Predefined */' '-D__TI_COMPILER_VERSION__=6001003 /* Predefined */' '-D__COMPILER_VERSION__=6001003 /* Predefined */' '-D__TMS320C2000__=1 /* Predefined */' '-D_TMS320C2000=1 /* Predefined */' '-D__TMS320C28XX__=1 /* Predefined */' '-D_TMS320C28XX=1 /* Predefined */' '-D__TMS320C28X__=1 /* Predefined */' '-D_TMS320C28X=1 /* Predefined */' '-D__SIZE_T_TYPE__=unsigned long /* Predefined */' '-D__PTRDIFF_T_TYPE__=long /* Predefined */' '-D__WCHAR_T_TYPE__=unsigned int /* Predefined */' '-D__little_endian__=1 /* Predefined */' '-D__TI_STRICT_ANSI_MODE__=1 /* Predefined */' '-D__TI_WCHAR_T_BITS__=16 /* Predefined */' '-D__TI_GNU_ATTRIBUTE_SUPPORT__=0 /* Predefined */' '-D__TI_STRICT_FP_MODE__=1 /* Predefined */' -D_INLINE=1 -D_OPTIMIZE_FOR_SPACE=1 -I/src -I/opt/ti/ccs/bios/packages/ti/bios/include -I/opt/ti/ccs/bios9/packages/ti/rtdx/include/c2000 -I/opt/ti/ccs/xdais/packages/ti/xdais -I/opt/ti/ccs/tools/compiler/c2000/bin/../include/ -- ./custom.c which doesn't compile (several undefined identifiers but even so, it did manage to include those directories which solved one breaking issue). I'm not sure, but is there a way to pass an argument to the extractor and have it pass that as-is on to the compiler? For instance attempting I'm unclear still however, is the extractor running the gcc compiler with the above arguments? If it is, is there any way to swap out the gcc compiler for the cl2000 compiler and tell the extractor to run that instead? Or you're saying the only way forward is to write middleware that drops any non standard gcc arguments and patch out any non standard c code in every file in the repo before the extractor runs (which is probably is too big of a challenge to be worth the effort i'm afraid given an it's an existing complex app with many dependencies to cl2000 constructs) as an aside: this lua script has alot of useful components if needing to transform commands before they're executed by the extractor: https://github.com/microsoft/codeql/blob/d9364c060e8897bb907b05feef458c3892ee38a2/csharp/tools/tracing-config.lua |
@TomShirley Unfortunately, the C++
I'm afraid I don't know the details of the C++ extractor myself. The way the CodeQL tracer roughly works is that it intercepts all processes, matches their commands and arguments based on the tracer configuration (Lua script), and based on the configuration decides what to do with the command. If a command matches the pattern of a compiler, then the tracer typically let the compiler process proceed as normally, but in addition start an "extractor" process based on the compiler's command and arguments. An "extractor" behaves a lot like a compiler, but instead of generating code, it produces "trap" files which as text files that describe the information that should go into the CodeQL database. The C++ extractor has a ton of command line flags to make it behave as closely as possible as the intercepted compiler. It needs to know the include paths of the compiler's standard libraries, information about the architecture, the definition of all sorts of macros, the variant of C/C++ , any syntactic extensions of the compiler, and many more things. This is a lot of work to configure by hand, therefore, the "extractor" typically runs in two steps, first in "mimic" mode, which tries to run the compiler's command in various ways to detect how it behaves, and based on that in "real" mode, which runs the "extractor" with the original arguments of the compiler and all the additional configuration flags derived in the "mimic" phase. You might want to perform a few experiments with some small programs compiled with the supported compilers and have a look at the contents of the "mimic cache". You can do the same for the The "extractor" knows about (most of) the flags of the supported compilers and whether to interpret them or ignore them. For an unsupported compiler you need to drop or rewrite any of the command line arguments that the extractor does not understand. This is roughly all I know about how things work ;-) . It will be quite a bit of trial and error, but you can probably get a prototype implementation working this way. Consider contacting the Expert Services team for additional help. |
@aibaars interesting! I've tried
an example mimic_cache file:
The extractor doesn't appear to do anything with this file; even if i pass in a blank file to the extractor it will just compile with gcc and not use any of the contents of this file. If you know any more about how to instruct the extractor to use a mimic_cache file please share |
Ah yeah, you're right. I thought the extractor would take the information from that file and interpret it somehow. However, when I look at the example command line you gave earlier, it looks like the contents of that file have somehow been injected in the command line mostly as
These flags look like specific to the mimicked compiler (I suppose |
Yes it does seem that the extractor with --mimic will indeed run the cl2000 compiler as a first pass to glean info like the -D arguments, but then I assume it runs the gcc next. Without a way to instruct the extractor to not use gcc for the actual compile step but rather use the provided compiler I might be at a dead end. Regardless thanks for your helpful replies to get to this point 🙏 |
The extractor does not do the actual compile step. The CodeQL tracer intercepts the "exec" system call that would run the actual compile step. A typical tracer configuration would run the actual compile step first without changing any of its arguments, followed by an "extractor" call with arguments based on the ones from the intercepted "exec". As a result the build would run as it normally would, but in addition it also runs the extractor for each compilation step. |
I agree that the normal (non codeql) compile step is being run with the ti compiler, which originates from the
which are the same errors i get when running gcc:
whereas running the ti compiler with
Runs fine and compiles. So this leads me to believe that the extractor as part of doing the trap compilations is using gcc and not my passed in --mimic binary. Maybe that's not right but I can't seem to force the extractor to use the ti compiler for the trap compilation using |
I understand the confusion. If you look at the messages then they are similar to gcc but not exactly the same. The Where is this |
ah that's interesting! The missing Arg type is defined in #if defined(_54_) || defined(_6x_)
typedef Int Arg;
#elif defined(_55_) || defined(_28_)
typedef void *Arg;
#else I've since been able to solve these sort of missing types by adding an The next issue i'm hitting is that TI has this sort of preprocessor code througout many of their core header files: #ifdef __cplusplus
extern "C" namespace std {
#endif And it seems that the compiler that the extractor calls is internally running in c++ mode (with this constant defined) and this is breaking compilation as I need the compiler to not define this as i'm building C code. I need a way to instruct the extractor to run in C mode essentially I've tried passing in the option [E 12:14:47 42445] Warning[extractor-c++]: In construct_message: Warning: "__cplusplus" is predefined; attempted redefinition ignored getting closer.. |
Getting closer indeed. There is probably some flag to toggle C or C++ mode. Not sure what it is though. You could try comparing a simple gcc vs g++ command and look in the build-tracer.log for ideas |
ok there seems to be a way to do this via /root/bin/codeql/codeql/cpp/tools/linux64/extractor --mimic /usr/bin/gcc -I/path -D__PTRDIFF_T_TYPE__=long -D '__SIZE_T_TYPE__=unsigned long' -D__LARGE_MODEL__=1 -D_TMS320XX=1 -D__WCHAR_T_TYPE__=long ./custom.c or without |
When I first created a database by
codeql-cli
for C/C++ source files for some embedded system software codes generated by Code Composer Studio, it failed to extract any source file even though thegmake
build process ran successfully.I found the root cause is that the
makefile
takes TMS320C2000 C/C++ Compiler as the compiler for the project. Even when I only have thehelloworld.c
as:build.sh
as"C:\ti\ccs1100\ccs\tools\compiler\ti-cgt-c2000_21.6.0.LTS\bin\cl2000" helloworld.c
And run the command as:
codeql database create cpp-database-01 --language=cpp --command="bash build.sh"
No source file can be extracted and the error message shows:
I guess the extractor in
codeql
fails to monitor the compilation process of TMS320C2000 C/C++ Compiler, which technically can be traced as it has similar usages to take source/lib/include files with other compilers (e.g.,gcc
,clang
).I really want to apply
codeql
on those codes compiled by some special compilers to do some security analysis, but get stock with the extraction of source files. How can I fix the problem? Can I change some extractor configuration? Or if there is any direction to modify the source code ofcodeql
to generalize the compiler requirement? Or can I add the source files to the database manually?Thank you very much.
The text was updated successfully, but these errors were encountered: