Shared: Add DataFlow::DeduplicatePathGraph #14350

asgerf · 2023-10-02T09:42:19Z

Adds a shared parameterised module, DataFlow::DeduplicatePathGraph, for post-processing a PathGraph so that it doesn't result in duplicate alerts or alerts with multiple identical paths.

This issue usually arises from using FlowState, which is embedded in the PathNode but not rendered as part of its string value. This can thus result in paths that have different intermediate flow states but appear to be identical to the end-user.

The issue with multiple alerts, i.e. seemingly-identical rows in the #select clause, is particularly bad for tools that attempt to diff results (such as DCA) but does not perform its own deduplication in advance.

The module works by projecting PathNode down to their (node, toString) values, which is closer to what the end-user actually sees in the end. By seeing the path graph as an NFA that accepts input symbols of type (node, toString) we try to minimise this NFA by merging states.

This is needed by the JavaScript data-flow migration, but I've put this in its own PR so it can be reviewed separately. I've used the library in a Ruby query that had some very ad-hoc alert deduplication logic. Note that the expected output diff is mainly due to reordering of result sets in the output.

hvitved · 2023-10-03T19:11:36Z

I wonder if this should be the default graph exposed by PathGraph, and then have the original graph exposed as, say, RawPathGraph?

asgerf · 2023-10-11T12:32:06Z

I've found a bug in the handling of subpaths that I'd like to fix, so taking this back into draft state for now.

shared/dataflow/codeql/dataflow/DataFlow.qll

+      /**
+       * Like `getOutDegreeFromPathNode` except counts `subpath` tuples.
+       */
+      private int getSubpathOutDegreeFromPathNode(InputPathNode pathNode) {


shared/dataflow/codeql/dataflow/DataFlow.qll

+      /**
+       * Like `getOutDegreeFromNode` except counts `subpath` tuples.
+       */
+      private int getSubpathOutDegreeFromNode(Node node, string toString) {


…ator

aschackmull · 2024-12-16T09:29:41Z

java/ql/test/library-tests/dataflow/deduplicate-path-graph/test.ql

+  node.getNode().asExpr() = propagateCall(state) and call = false
+  or
+  exists(Graph::PathNode prev | reachableFromPropagate(prev, state, call) |
+    Graph::edges(prev, node, _, _)


I think you need to amend this case slightly if you want the call bit to have meaning:

Suggested change

Graph::edges(prev, node, _, _)

Graph::edges(prev, node, _, _) and

not Graph::subpaths(prev, node, _, _)

aschackmull · 2024-12-16T09:31:06Z

java/ql/test/library-tests/dataflow/deduplicate-path-graph/test.ql

+  exists(Graph::PathNode prev | reachableFromPropagate(prev, state, call) |
+    Graph::edges(prev, node, _, _)
+    or
+    Graph::subpaths(prev, _, _, node) // arg -> out


This line ought to be superfluous, but it can't hurt.

aschackmull · 2024-12-16T09:51:35Z

shared/dataflow/codeql/dataflow/DataFlow.qll

+
+      /** Gets a successor of `node` including subpath flow-through. */
+      InputPathNode stepEx(InputPathNode node) {
+        step(node, result, _, _)


I think you meant to exclude subpath enter- and exit-steps here, right?

Suggested change

step(node, result, _, _)

step(node, result, _, _) and

not result = enterSubpathStep(node) and

not result = exitSubpathStep(node)

aschackmull · 2024-12-16T09:53:00Z

shared/dataflow/codeql/dataflow/DataFlow.qll

+      InputPathNode stepEx(InputPathNode node) {
+        step(node, result, _, _)
+        or
+        subpathStep(node, _, _, result) // assuming the input is pruned properly, all subpaths have flow-through


This should already be present in step, so it should be superfluous here, but it cannot hurt, so I don't mind keeping it. But maybe add a comment about this.

shared/dataflow/codeql/dataflow/DataFlow.qll

aschackmull · 2024-12-16T10:12:39Z

ruby/ql/src/queries/security/cwe-094/CodeInjection.ql

@@ -16,20 +16,9 @@

 private import codeql.ruby.AST
 private import codeql.ruby.security.CodeInjectionQuery
-import CodeInjectionFlow::PathGraph
+import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>


How about not re-exporting the dedup'ed pathgraph, and instead export the boilerplate-translated flowPath predicate? Then we'd write

Suggested change

import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>

module CodeInjectionFlowDeDup = DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>;

import CodeInjectionFlowDeDup::PathGraph

aschackmull · 2024-12-16T10:13:15Z

ruby/ql/src/queries/security/cwe-094/CodeInjection.ql

+from PathNode source, PathNode sink
+where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode())


.. and

Suggested change

from PathNode source, PathNode sink

where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode())

from CodeInjectionFlowDeDup::PathNode source, CodeInjectionFlowDeDup::PathNode sink

where CodeInjectionFlowDeDup::flowPath(source, sink)

aschackmull · 2024-12-16T10:17:42Z

There's still the open question about whether this algorithm can introduce FP path explanations, i.e. whether all collapsed paths can be translated back to a path in the original graph. We tried to construct a proof, but it's tricky to properly account for subpaths. But that probably shouldn't hold the PR back - it seems plausibly good enough, and we're still referring to the original flowPath predicate, so we cannot introduce any actual new FP results, so it's reasonably safe.

Co-authored-by: Anders Schack-Mulligen <[email protected]>

Argument-passing and flow-through edges are present in 'edges' in addition to 'subpaths', but the implementation didn't take this into account.

aschackmull

LGTM

asgerf · 2024-12-17T15:23:53Z

Doing another DCA run just to be safe

asgerf · 2024-12-18T13:04:23Z

DCA run looks fine

github-actions bot added documentation Ruby DataFlow Library labels Oct 2, 2023

asgerf marked this pull request as ready for review October 2, 2023 12:27

asgerf requested a review from a team as a code owner October 2, 2023 12:27

asgerf mentioned this pull request Oct 9, 2023

[Feature branch] JS: Migrate to shared dataflow library #14412

Merged

asgerf marked this pull request as draft October 11, 2023 12:30

asgerf force-pushed the shared/deduplicate-path-graph branch from bc0ed45 to dc94503 Compare October 11, 2023 14:55

github-actions bot added the Java label Oct 11, 2023

github-advanced-security bot found potential problems Oct 11, 2023

View reviewed changes

asgerf marked this pull request as ready for review October 13, 2023 11:32

asgerf requested a review from a team as a code owner October 13, 2023 11:32

sidshank requested review from hvitved and aschackmull October 23, 2023 12:58

asgerf added 6 commits December 11, 2024 11:29

Shared: Add DataFlow::DeduplicatePathGraph

cba7b98

Shared: change note

8efdc2d

Java: add test for spurious flow from path graph deduplication

0eb543e

Shared: use a call bit when tracking reachability to/from a discrimin…

5aa1242

…ator

JS: Update to account for key,val pairs on edges

815581d

Ruby: use DeduplicatePathGraph in CodeInjection query

f9c0ba3

asgerf force-pushed the shared/deduplicate-path-graph branch from dc94503 to f9c0ba3 Compare December 11, 2024 10:50

asgerf added 3 commits December 11, 2024 13:19

Java: MethodAccess -> MethodCall

7363888

Java: update test to account for key,val

afdbf2c

Java: update test output with provenance

889100a

aschackmull reviewed Dec 16, 2024

View reviewed changes

shared/dataflow/codeql/dataflow/DataFlow.qll Outdated Show resolved Hide resolved

aschackmull reviewed Dec 16, 2024

View reviewed changes

shared/dataflow/codeql/dataflow/DataFlow.qll Outdated Show resolved Hide resolved

aschackmull reviewed Dec 16, 2024

View reviewed changes

asgerf and others added 2 commits December 16, 2024 13:14

Apply suggestions from code review

0edb306

Co-authored-by: Anders Schack-Mulligen <[email protected]>

Shared: Ensure subpath-induced edges are handled properly

f2968f4

Argument-passing and flow-through edges are present in 'edges' in addition to 'subpaths', but the implementation didn't take this into account.

aschackmull previously approved these changes Dec 16, 2024

View reviewed changes

asgerf added 2 commits December 17, 2024 11:15

Shared: Show test failures

950ae44

Shared: Fix propagation of call bit

8340841

asgerf dismissed aschackmull’s stale review via 8340841 December 17, 2024 10:16

Shared: autoformat

e34fbc8

aschackmull approved these changes Dec 17, 2024

View reviewed changes

asgerf merged commit be939dc into github:main Dec 18, 2024
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared: Add DataFlow::DeduplicatePathGraph #14350

Shared: Add DataFlow::DeduplicatePathGraph #14350

asgerf commented Oct 2, 2023 •

edited

Loading

hvitved commented Oct 3, 2023

asgerf commented Oct 11, 2023

aschackmull Dec 16, 2024

aschackmull Dec 16, 2024

aschackmull Dec 16, 2024

aschackmull Dec 16, 2024

aschackmull Dec 16, 2024

aschackmull Dec 16, 2024

aschackmull commented Dec 16, 2024

aschackmull left a comment

asgerf commented Dec 17, 2024

asgerf commented Dec 18, 2024

	Graph::edges(prev, node, _, _)
	Graph::edges(prev, node, _, _) and
	not Graph::subpaths(prev, node, _, _)

-        step(node, result, _, _)
+        step(node, result, _, _) and
+        not result = enterSubpathStep(node) and
+        not result = exitSubpathStep(node)

	import DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>
	module CodeInjectionFlowDeDup = DataFlow::DeduplicatePathGraph<CodeInjectionFlow::PathNode, CodeInjectionFlow::PathGraph>;
	import CodeInjectionFlowDeDup::PathGraph

		from PathNode source, PathNode sink
		where CodeInjectionFlow::flowPath(source.getAnOriginalPathNode(), sink.getAnOriginalPathNode())

Shared: Add DataFlow::DeduplicatePathGraph #14350

Shared: Add DataFlow::DeduplicatePathGraph #14350

Conversation

asgerf commented Oct 2, 2023 • edited Loading

hvitved commented Oct 3, 2023

asgerf commented Oct 11, 2023

aschackmull Dec 16, 2024

Choose a reason for hiding this comment

aschackmull Dec 16, 2024

Choose a reason for hiding this comment

aschackmull Dec 16, 2024

Choose a reason for hiding this comment

aschackmull Dec 16, 2024

Choose a reason for hiding this comment

aschackmull Dec 16, 2024

Choose a reason for hiding this comment

aschackmull Dec 16, 2024

Choose a reason for hiding this comment

aschackmull commented Dec 16, 2024

aschackmull left a comment

Choose a reason for hiding this comment

asgerf commented Dec 17, 2024

asgerf commented Dec 18, 2024

asgerf commented Oct 2, 2023 •

edited

Loading