-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPP: Flow Into Barrier Guards #10011
Comments
Hi @bdrodes, There are a couple of ways to handle this. Sadly, the override predicate isBarrier(Node node) {
exists(GuardCondition guard |
// The node is only reached when the value of `guard` is true.
guard.controls(node.asExpr().getBasicBlock(), true) and
// and the value of the guard will always evaluate to a value given by a call to `MyGuard`.
globalValueNumber(guard).getAnExpr().(Call).getTarget().hasName("MyGuard")
)
} Note that this uses a couple of libraries that you need to import explicitly: import semmle.code.cpp.controlflow.Guards
import semmle.code.cpp.valuenumbering.GlobalValueNumbering For a full example, this query: /**
* @kind path-problem
*/
import cpp
import semmle.code.cpp.dataflow.DataFlow::DataFlow
import semmle.code.cpp.controlflow.Guards
import semmle.code.cpp.valuenumbering.GlobalValueNumbering
import PathGraph
class Conf extends Configuration {
Conf() { this = "Conf" }
override predicate isSource(Node source) { source.asExpr().(Call).getTarget().hasName("source") }
override predicate isSink(Node sink) {
exists(Call c | c.getTarget().hasName("use") | c.getAnArgument() = sink.asExpr())
}
override predicate isBarrier(Node node) {
exists(GuardCondition guard |
guard.controls(node.asExpr().getBasicBlock(), true) and
globalValueNumber(guard).getAnExpr().(Call).getTarget().hasName("MyGuard")
)
}
}
from Conf conf, PathNode source, PathNode sink
where conf.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "" won't raise any alerts on the following example: int source();
void use(int);
bool MyGuard(int);
void test() {
int x = source();
if(MyGuard(x)) {
use(x);
}
int y = source();
bool res = MyGuard(y);
if(res) {
use(y);
}
} I hope that helps! |
Thanks @MathiasVP I'll experiment with this a bit. I'm totally unfamiliar with the GlobalValueNumbering module. Sounds like SSA. I'll have to do a bit of research. |
My only general concern is if this approach will work for more complex control flow with respect to calculating the guard. The example I gave was purposefully simplified. |
Let me know if you discover that the suggested approach doesn't work for your use case. In that case, I'm sure we can find something else that will work :) |
Here's an example of what I mean by more complex flow where I'm not sure how to apply the global value numbering solution yet. In my solution I've defined a way to parse guard expressions to determine if a particular use of I've defined an isValid and isInvalid check, so I know which branch of the guard is the 'sanitizer/barrier'. The signature basically looks like this:
This is problematic for cases like this below:
I'm not sure how to use a GlobalValueNumbering here, or if its the right general solution for cases like this. If I would guess, I'd say I need to find a sub-expression of the guard where the number value is MyGuard, then write a new predicate that would parse the computed subexression in a new |
For more context, this is the style of barrier guard I was using. I had to define my own extension of GuardCondition to ensure I got the full guarding condition, as opposed to a nested expression.
|
Thinking on this more, it just seems like the global value solution will eventually require me to make a separate data trace since the values in my examples may flow arbitrarily into a guard (consider the example below). In these examples I may have go more than 1 step back in the global value numbering to determine if the guard is indeed guarding a use correctly. I can try to make a recursive global value look up, but it at this point I think I'm re-inventing dataflow. I suppose I could put a dataflow inside my dataflow, but this also seems awkward.
|
Final thought. If a data path must be considered into the guard as well, then it is also unclear if there is any way to establish the isValid or isInvalid on the guarding condition (i.e., which branch is safely validating the use), since the full expression needed to be evaluated is found in the composite of the data path. To clarify our use case, I'm constantly hitting this pattern of guard and need to make a more robust flow solution. Each use case might have different assumptions we can make on common use, but generally speaking, I have to solve this same problem over and over for different projects with different guarding conditions, so I'm looking for how robust I can make a query in general to fix the general problem of flow into a barrier/sanitizer guard. |
Alright, I think I have a way forward here. I've applied your global value numbering solution into my isValid and isInvalid predicates I mentioned. I've updated them to not only recurse over 'children' of the expression, but also to consider a child of the expression anything that is in the global value number set for a subexpression. In effect, the isValid and isInvalid do a backwards data trace in evaluating a guard's expression as sanitizing on the false or true branch. As a result, I have to add it to the isSanitizer for the trace config, not barrier guard as you suggested, but then this renders the barrier guard moot, so barrier guard can be removed entirely from the tracer. This sounds like the 'right' direction for a general solution given you original suggestion, but I have some concerns about the implied data trace within a data trace (i.e., the sanitizer now does a backwards data trace every invocation). I've used the I also have concerns that to address barrier gaurds generally, we actually don't use any notion of barrier guard, which seems counterintuitive. All that said, if this generally sounds like the 'codeql pardigm' for this general problem, we can call the thread closed, and I'll share this concept with my team. |
@MathiasVP Continuing to play with this, and I think this direction is very good and also solves associating aliases with guarded conditions. The issues I see now is not every language has a global value numbering library (as far as I can tell), but I assume there is a similar solution with SSA. I'm going to close this thread for now. Thanks. |
Hi @bdrodes, Sorry I'm only responding now! I'm happy you were able to find a satisfying solution that worked for you :) Indeed, not every language has a global value numbering library, and I'm sure you could do a similar solution with SSA for those languages (you could conceivable do it for C/C++ as well). Let me try to comment on a couple of points you raised:
If this works for you that's great! There might be a potential performance problem by considering every expression with the same value number as a subexpression of some expression
(i.e., But with your value number encoding, the table will look like:
where Once again, if it works for you that's great! But be wary of your analysis exploding :)
You shouldn't really need a The CodeQL compiler does cache the results automatically, but this is sometimes broken by optimizations: One of the optimizations that CodeQL does is that it specializes your predicates based on the context it's used in, and that might prevent caching of intermediate results. The
Yes, barrier guards doesn't work super well for C/C++ at the moment, and we're actively working on improving this situation. I hope that you'll soon be able to write the barrier guard you expect and get the right behavior automatically. I'll be sure to note this thread down in our internal issue about this 🙇. |
@MathiasVP It's interesting you say CPP in particular is bad with barrier guards, but from what I can tell, the problem I raised is the same in every language. Is there a language where this problem doesn't exist? In particular, I need to address this problem in C#, and Java immediately, but likely will need it addressed in every other language in the near future. Java is one case where we don't have global value numbering. |
@MathiasVP Is there a way to get involved in the discussion and experimentation with barrier guard improvements? This, in my opinion, is the primary bottleneck in effectively using codeql in almost every query I've come across, created, or requests to assess/fix other people's queries. Is there someone at GitHub I can grab some time with to hash out some of the concerns and timelines? |
@MathiasVP (attempting to split my comments into separate comments to keep the threads easier to follow).
I was worried about this too, so I defined something like this to avoid re-evaluating an expression that is identical in hopes I would just get the definition (which is why I had hoped an SSA version of this would work):
The idea is when I recurse parsing a guard, I get a child expression based on the semantics of the expression. I want to get the original child back or something in the globalValueNumber set that isn't the same expression. The 'strict' boolean just allows me to force using globalValueNumbering. Example use inside a predicate called
I'm not sure if this addresses the explosion issue you are describing, but my intent here was to avoid an explosion. |
@MathiasVP I was looking at how to do this similar trick for csharp. I found this is a GVN module but it is private. Do you have any suggestions for csharp? |
I'm looking for general assistance on how to properly use codeql with barrier guards when the guard condition may not be computed in the guard itself, but instead also data traces into a guard:
Consider these two cases, where
use(x)
is a sink, but if it is guarded byMyGuard(x)
it is considered safe. The first case is the easy case where BarrierGuard (or SanitizerGuard) can be used fairly easily, and the latter case seems to require yet another dataflow:For the latter/harder case, what is the CodeQL paradigm to associate the guard to
x
such that I know use(x) is safe? Seems to me, out of the box, there is no flow fromx
to res. I could add an additional step, in a dataflow/taint analysis, but that would still not let me define barrier guard check conditions to associate that value with the originating variable. Perhaps there is a way to pass around misc. metadata too that will feed into a BarrierGuard check in these cases?The text was updated successfully, but these errors were encountered: