Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared: Add DataFlow::DeduplicatePathGraph #14350

Merged
merged 14 commits into from
Dec 18, 2024

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Oct 2, 2023

Adds a shared parameterised module, DataFlow::DeduplicatePathGraph, for post-processing a PathGraph so that it doesn't result in duplicate alerts or alerts with multiple identical paths.

This issue usually arises from using FlowState, which is embedded in the PathNode but not rendered as part of its string value. This can thus result in paths that have different intermediate flow states but appear to be identical to the end-user.

The issue with multiple alerts, i.e. seemingly-identical rows in the #select clause, is particularly bad for tools that attempt to diff results (such as DCA) but does not perform its own deduplication in advance.

The module works by projecting PathNode down to their (node, toString) values, which is closer to what the end-user actually sees in the end. By seeing the path graph as an NFA that accepts input symbols of type (node, toString) we try to minimise this NFA by merging states.

This is needed by the JavaScript data-flow migration, but I've put this in its own PR so it can be reviewed separately. I've used the library in a Ruby query that had some very ad-hoc alert deduplication logic. Note that the expected output diff is mainly due to reordering of result sets in the output.

@asgerf asgerf marked this pull request as ready for review October 2, 2023 12:27
@asgerf asgerf requested a review from a team as a code owner October 2, 2023 12:27
@hvitved
Copy link
Contributor

hvitved commented Oct 3, 2023

I wonder if this should be the default graph exposed by PathGraph, and then have the original graph exposed as, say, RawPathGraph?

@asgerf
Copy link
Contributor Author

asgerf commented Oct 11, 2023

I've found a bug in the handling of subpaths that I'd like to fix, so taking this back into draft state for now.

@asgerf asgerf force-pushed the shared/deduplicate-path-graph branch from bc0ed45 to dc94503 Compare October 11, 2023 14:55
@github-actions github-actions bot added the Java label Oct 11, 2023
/**
* Like `getOutDegreeFromPathNode` except counts `subpath` tuples.
*/
private int getSubpathOutDegreeFromPathNode(InputPathNode pathNode) {

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for pathNode, but the QLDoc mentions subpath
/**
* Like `getOutDegreeFromNode` except counts `subpath` tuples.
*/
private int getSubpathOutDegreeFromNode(Node node, string toString) {

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for node, or toString, but the QLDoc mentions subpath
@asgerf asgerf marked this pull request as ready for review October 13, 2023 11:32
@asgerf asgerf requested a review from a team as a code owner October 13, 2023 11:32
@asgerf asgerf force-pushed the shared/deduplicate-path-graph branch from dc94503 to f9c0ba3 Compare December 11, 2024 10:50
node.getNode().asExpr() = propagateCall(state) and call = false
or
exists(Graph::PathNode prev | reachableFromPropagate(prev, state, call) |
Graph::edges(prev, node, _, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to amend this case slightly if you want the call bit to have meaning:

Suggested change
Graph::edges(prev, node, _, _)
Graph::edges(prev, node, _, _) and
not Graph::subpaths(prev, node, _, _)

exists(Graph::PathNode prev | reachableFromPropagate(prev, state, call) |
Graph::edges(prev, node, _, _)
or
Graph::subpaths(prev, _, _, node) // arg -> out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line ought to be superfluous, but it can't hurt.


/** Gets a successor of `node` including subpath flow-through. */
InputPathNode stepEx(InputPathNode node) {
step(node, result, _, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to exclude subpath enter- and exit-steps here, right?

Suggested change
step(node, result, _, _)
step(node, result, _, _) and
not result = enterSubpathStep(node) and
not result = exitSubpathStep(node)

InputPathNode stepEx(InputPathNode node) {
step(node, result, _, _)
or
subpathStep(node, _, _, result) // assuming the input is pruned properly, all subpaths have flow-through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be present in step, so it should be superfluous here, but it cannot hurt, so I don't mind keeping it. But maybe add a comment about this.

@@ -16,20 +16,9 @@

private import codeql.ruby.AST
private import codeql.ruby.security.CodeInjectionQuery
import CodeInjectionFlow::PathGraph
import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about not re-exporting the dedup'ed pathgraph, and instead export the boilerplate-translated flowPath predicate? Then we'd write

Suggested change
import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>
module CodeInjectionFlowDeDup = DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>;
import CodeInjectionFlowDeDup::PathGraph

Comment on lines +21 to +22
from PathNode source, PathNode sink
where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. and

Suggested change
from PathNode source, PathNode sink
where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode())
from CodeInjectionFlowDeDup::PathNode source, CodeInjectionFlowDeDup::PathNode sink
where CodeInjectionFlowDeDup::flowPath(source, sink)

@aschackmull
Copy link
Contributor

There's still the open question about whether this algorithm can introduce FP path explanations, i.e. whether all collapsed paths can be translated back to a path in the original graph. We tried to construct a proof, but it's tricky to properly account for subpaths. But that probably shouldn't hold the PR back - it seems plausibly good enough, and we're still referring to the original flowPath predicate, so we cannot introduce any actual new FP results, so it's reasonably safe.

asgerf and others added 2 commits December 16, 2024 13:14
Co-authored-by: Anders Schack-Mulligen <[email protected]>
Argument-passing and flow-through edges are present in 'edges' in addition to 'subpaths', but the implementation didn't take this into account.
aschackmull
aschackmull previously approved these changes Dec 16, 2024
Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asgerf
Copy link
Contributor Author

asgerf commented Dec 17, 2024

Doing another DCA run just to be safe

@asgerf
Copy link
Contributor Author

asgerf commented Dec 18, 2024

DCA run looks fine

@asgerf asgerf merged commit be939dc into github:main Dec 18, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants