-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared: Add DataFlow::DeduplicatePathGraph #14350
Conversation
I wonder if this should be the default graph exposed by |
I've found a bug in the handling of |
bc0ed45
to
dc94503
Compare
/** | ||
* Like `getOutDegreeFromPathNode` except counts `subpath` tuples. | ||
*/ | ||
private int getSubpathOutDegreeFromPathNode(InputPathNode pathNode) { |
Check warning
Code scanning / CodeQL
Missing QLDoc for parameter Warning
/** | ||
* Like `getOutDegreeFromNode` except counts `subpath` tuples. | ||
*/ | ||
private int getSubpathOutDegreeFromNode(Node node, string toString) { |
Check warning
Code scanning / CodeQL
Missing QLDoc for parameter Warning
dc94503
to
f9c0ba3
Compare
node.getNode().asExpr() = propagateCall(state) and call = false | ||
or | ||
exists(Graph::PathNode prev | reachableFromPropagate(prev, state, call) | | ||
Graph::edges(prev, node, _, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to amend this case slightly if you want the call bit to have meaning:
Graph::edges(prev, node, _, _) | |
Graph::edges(prev, node, _, _) and | |
not Graph::subpaths(prev, node, _, _) |
exists(Graph::PathNode prev | reachableFromPropagate(prev, state, call) | | ||
Graph::edges(prev, node, _, _) | ||
or | ||
Graph::subpaths(prev, _, _, node) // arg -> out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line ought to be superfluous, but it can't hurt.
|
||
/** Gets a successor of `node` including subpath flow-through. */ | ||
InputPathNode stepEx(InputPathNode node) { | ||
step(node, result, _, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to exclude subpath enter- and exit-steps here, right?
step(node, result, _, _) | |
step(node, result, _, _) and | |
not result = enterSubpathStep(node) and | |
not result = exitSubpathStep(node) |
InputPathNode stepEx(InputPathNode node) { | ||
step(node, result, _, _) | ||
or | ||
subpathStep(node, _, _, result) // assuming the input is pruned properly, all subpaths have flow-through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already be present in step
, so it should be superfluous here, but it cannot hurt, so I don't mind keeping it. But maybe add a comment about this.
@@ -16,20 +16,9 @@ | |||
|
|||
private import codeql.ruby.AST | |||
private import codeql.ruby.security.CodeInjectionQuery | |||
import CodeInjectionFlow::PathGraph | |||
import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about not re-exporting the dedup'ed pathgraph, and instead export the boilerplate-translated flowPath
predicate? Then we'd write
import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph> | |
module CodeInjectionFlowDeDup = DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>; | |
import CodeInjectionFlowDeDup::PathGraph |
from PathNode source, PathNode sink | ||
where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. and
from PathNode source, PathNode sink | |
where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode()) | |
from CodeInjectionFlowDeDup::PathNode source, CodeInjectionFlowDeDup::PathNode sink | |
where CodeInjectionFlowDeDup::flowPath(source, sink) |
There's still the open question about whether this algorithm can introduce FP path explanations, i.e. whether all collapsed paths can be translated back to a path in the original graph. We tried to construct a proof, but it's tricky to properly account for subpaths. But that probably shouldn't hold the PR back - it seems plausibly good enough, and we're still referring to the original |
Co-authored-by: Anders Schack-Mulligen <[email protected]>
Argument-passing and flow-through edges are present in 'edges' in addition to 'subpaths', but the implementation didn't take this into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Doing another DCA run just to be safe |
DCA run looks fine |
Adds a shared parameterised module,
DataFlow::DeduplicatePathGraph
, for post-processing aPathGraph
so that it doesn't result in duplicate alerts or alerts with multiple identical paths.This issue usually arises from using
FlowState
, which is embedded in thePathNode
but not rendered as part of its string value. This can thus result in paths that have different intermediate flow states but appear to be identical to the end-user.The issue with multiple alerts, i.e. seemingly-identical rows in the
#select
clause, is particularly bad for tools that attempt to diff results (such as DCA) but does not perform its own deduplication in advance.The module works by projecting
PathNode
down to their(node, toString)
values, which is closer to what the end-user actually sees in the end. By seeing the path graph as an NFA that accepts input symbols of type(node, toString)
we try to minimise this NFA by merging states.This is needed by the JavaScript data-flow migration, but I've put this in its own PR so it can be reviewed separately. I've used the library in a Ruby query that had some very ad-hoc alert deduplication logic. Note that the expected output diff is mainly due to reordering of result sets in the output.