Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: precise flow through calls to .apply() #10126

Merged
merged 8 commits into from
Aug 24, 2022
Merged

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Aug 22, 2022

For some context see this external contribution that never got merged: #6559

The improve performance of global dataflow by inlining a step predicate commit got the predicate from 4m8s to 1m36s on bwip-js.

An evaluation looks good. There are a few new results, and the performance looks good.
(The new js/unsafe-jquery-plugin results are from the new arguments node, the others are from the improved flow through arrays).

In an earlier evaluation I did a comparison where I disabled ReflectiveParametersNode: https://github.com/github/codeql-dca-main/tree/data/erik-krogh/myApply-no-parms-node-nightly-codescanning/reports
The query-suite is bigger, so it finds more results.
If you look at the alert-meta-comparison in the two evaluations you can see that most of the new flow doesn't have anything to do with ReflectiveParametersNode.
(Library inputs worked slightly different back when I did that evaluation, so those numbers are off).

/cc @yuske This is inspired by your attempt to do the same thing: #6559

@github-actions github-actions bot added the JS label Aug 22, 2022
@erik-krogh erik-krogh marked this pull request as ready for review August 22, 2022 12:26
@erik-krogh erik-krogh requested a review from a team as a code owner August 22, 2022 12:26
@calumgrant calumgrant requested a review from asgerf August 22, 2022 12:48
Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I really like the results of those evaluations! 👍

It seems we lost three tainted-nodes (open the raw document to see them). They all seem related to flow through forEach. Could you check if this is a regression or if this was spurious flow?

@@ -0,0 +1,85 @@
function foo1(arg1, arg2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function foo1(arg1, arg2) {
import * as dummy from 'dummy';
function foo1(arg1, arg2) {

Could you make sure the file is seen as a module, not a script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't change any results 👍


ReflectiveParametersNode() { this = TReflectiveParametersNode(function) }

override string toString() { result = "the arguments object of " + function.describe() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override string toString() { result = "the arguments object of " + function.describe() }
override string toString() { result = "'arguments' object of " + function.describe() }

These don't usually start with the. I added quotes because I think it might avoid confusion in some cases

override BasicBlock getBasicBlock() { result = function.getEntry().getBasicBlock() }

/**
* Gets the function corresponding to this reflektive parameters node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the function corresponding to this reflektive parameters node.
* Gets the function whose `arguments` object is represented by this node.

@@ -68,7 +68,7 @@ module PrototypePollutingAssignment {
/**
* A parameter of an exported function, seen as a source prototype-polluting assignment.
*/
class ExternalInputSource extends Source, DataFlow::SourceNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to treat ReflectiveParmaetersNode as a SourceNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReflectiveParametersNode is a source node (but when I try to put that into the extends clause I get a non-monotonic recursion).

But the source should still not be a SourceNode, because the getALibraryInputParameter() predicate does not return a ReflectiveParametersNode, that predicate returns either a parameter or a reference to the arguments variable.
The location of ReflectiveParametersNode looks bad as a source (the location is the entire function), so I avoid using ReflectiveParametersNode directly as a source.

@erik-krogh erik-krogh added the no-change-note-required This PR does not need a change note label Aug 23, 2022
@erik-krogh
Copy link
Contributor Author

erik-krogh commented Aug 23, 2022

It seems we lost three tainted-nodes (open the raw document to see them). They all seem related to flow through forEach. Could you check if this is a regression or if this was spurious flow?

The cause of those lost nodes is the change of ArrayCreationStep to be a PreCallGraphStep, which causes us to track this array precisely.
I'm not sure how that causes is to loose the tainted nodes.

But we still have taint on those lines.
E.g. here we still mark url and url.match(/&tid=(\d+)/i) as tainted, we just loose url.match, which is benign.

So I think it's OK that we loose those tainted nodes.

I'll start an evaluation on default.yml with meta queries to confirm that we don't loose more.
Edit: That evaluation failed, so we won't have that.

@erik-krogh erik-krogh requested a review from asgerf August 23, 2022 13:38
Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@yuske
Copy link

yuske commented Aug 24, 2022

Hi @erik-krogh,
It is great that you finished this idea! Unfortunately, I was busy with a paper where we used CodeQL for JS static analysis and did not have enough time to do the performance measures of the two approaches that we discussed at #6559. Btw, the paper now is accepted at USENIX Security'23 and the early version is already available https://arxiv.org/pdf/2207.11171.pdf. We increased the recall metrics for Prototype Pollution queries from 33% and 25% to 88% - 97% (also thanks to this .apply() and .call() functions support, other changes in the JS stdlib and PP queries). We detected many new Prototype Pollution vulnerabilities that lead to RCE (not all are described in the paper yet, a part of them is under review).

@erik-krogh
Copy link
Contributor Author

We increased the recall metrics for Prototype Pollution queries from 33% and 25% to 88% - 97% (also thanks to this .apply() and .call() functions support, other changes in the JS stdlib and PP queries). We detected many new Prototype Pollution vulnerabilities that lead to RCE (not all are described in the paper yet, a part of them is under review).

Great! You're very welcome to open pull requests with those changes.
I recommend doing one small change at a time, otherwise it might be hard for us to evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JS no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants