-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java: Extend String dataflow models #7019
base: main
Are you sure you want to change the base?
Java: Extend String dataflow models #7019
Conversation
55288d1
to
1fe4204
Compare
Click to show differences in coveragejavaGenerated file changes for java
- Java Standard Library,``java.*``,3,519,30,13,,,7,,,10
+ Java Standard Library,``java.*``,3,543,30,13,,,7,,,10
- Totals,,175,5341,408,13,6,10,107,33,1,66
+ Totals,,175,5365,408,13,6,10,107,33,1,66
- java.io,3,,27,,3,,,,,,,,,,,,,,,,,,,26,1
+ java.io,3,,29,,3,,,,,,,,,,,,,,,,,,,28,1
- java.lang,,,51,,,,,,,,,,,,,,,,,,,,,41,10
+ java.lang,,,72,,,,,,,,,,,,,,,,,,,,,52,20
- java.util,,,429,,,,,,,,,,,,,,,,,,,,,15,414
+ java.util,,,430,,,,,,,,,,,,,,,,,,,,,15,415 |
@@ -11,19 +11,28 @@ private class StringSummaryCsv extends SummaryModelCsv { | |||
"java.lang;String;false;concat;(String);;Argument[0];ReturnValue;taint", | |||
"java.lang;String;false;concat;(String);;Argument[-1];ReturnValue;taint", | |||
"java.lang;String;false;copyValueOf;;;Argument[0];ReturnValue;taint", | |||
"java.lang;String;false;endsWith;;;Argument[-1];ReturnValue;taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have removed this because no other boolean
returning methods are modelled.
"java.lang;String;false;indent;;;Argument[-1];ReturnValue;taint", | ||
"java.lang;String;false;intern;;;Argument[-1];ReturnValue;taint", | ||
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint", | ||
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint", // TODO: ArrayElement of Argument? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this here and maybe also in other cases use ArrayElement of Argument
instead? What is the difference between tracking taint form an array (or varargs?) compared to tracking taint from the elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because TaintTracking::Configuration
specifies an allowImplicitRead
method that allows implicit reads from an array, collection or map-value (i.e., we can treat y = propagateTaint(x)
or sinkTaint(x)
as if they were y = propagateTaint(x[n])
or sinkTaint(x.get(n))
), in practice it doesn't make much difference for typical taint-tracking applications. In addition it specifies an isAdditionalFlowStep
implementation that provides the opposite blurring of taint, saying that reading from a tainted map or tainted array yields tainted content.
In summary, for taint-tracking purposes Field
and SyntheticField
content continue to be faithfully distinguished, but array, collection and map content are blurred in both directions, with a tainted whole implying a tainted element and vice versa. You can specify them for clarity in the CSV description, but it won't make much (any?) difference to your results.
// TODO: Should `append` and `write` be modelled for Appendable and Writer instead? | ||
// Could then remove some of the modelled `append` method here and for StringBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Appendable
and Writer
be modelled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a positive step to me
Click to show differences in coveragejavaGenerated file changes for java
- Java Standard Library,``java.*``,3,519,30,13,,,7,,,10
+ Java Standard Library,``java.*``,3,542,30,13,,,7,,,10
- Totals,,175,5341,408,13,6,10,107,33,1,66
+ Totals,,175,5364,408,13,6,10,107,33,1,66
- java.io,3,,27,,3,,,,,,,,,,,,,,,,,,,26,1
+ java.io,3,,29,,3,,,,,,,,,,,,,,,,,,,28,1
- java.lang,,,51,,,,,,,,,,,,,,,,,,,,,41,10
+ java.lang,,,71,,,,,,,,,,,,,,,,,,,,,51,20
- java.util,,,429,,,,,,,,,,,,,,,,,,,,,15,414
+ java.util,,,430,,,,,,,,,,,,,,,,,,,,,15,415 |
"java.lang;StringBuilder;true;StringBuilder;(CharSequence);;Argument[0];Argument[-1];taint", | ||
"java.lang;StringBuilder;true;StringBuilder;(String);;Argument[0];Argument[-1];taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have specified the parameter types here to avoid tracking StringBuilder(int)
where the argument only represents the capacitiy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's few enough new methods here that it's probably best to just write some tests by hand for these, particularly since there are lambda tests involved and AFAIK the auto test generation stuff can't do that yet.
Suggest writing the tests then running them without this in place, as a quick way to find string methods that have non-CSV models that can be cleaned up.
"java.lang;String;false;indent;;;Argument[-1];ReturnValue;taint", | ||
"java.lang;String;false;intern;;;Argument[-1];ReturnValue;taint", | ||
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint", | ||
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint", // TODO: ArrayElement of Argument? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because TaintTracking::Configuration
specifies an allowImplicitRead
method that allows implicit reads from an array, collection or map-value (i.e., we can treat y = propagateTaint(x)
or sinkTaint(x)
as if they were y = propagateTaint(x[n])
or sinkTaint(x.get(n))
), in practice it doesn't make much difference for typical taint-tracking applications. In addition it specifies an isAdditionalFlowStep
implementation that provides the opposite blurring of taint, saying that reading from a tainted map or tainted array yields tainted content.
In summary, for taint-tracking purposes Field
and SyntheticField
content continue to be faithfully distinguished, but array, collection and map content are blurred in both directions, with a tainted whole implying a tainted element and vice versa. You can specify them for clarity in the CSV description, but it won't make much (any?) difference to your results.
// TODO: Should `append` and `write` be modelled for Appendable and Writer instead? | ||
// Could then remove some of the modelled `append` method here and for StringBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a positive step to me
"java.lang;String;false;replace;;;Argument[1];ReturnValue;taint", | ||
"java.lang;String;false;replace;;;Argument[-1];ReturnValue;taint", | ||
"java.lang;String;false;replaceAll;;;Argument[1];ReturnValue;taint", | ||
"java.lang;String;false;replaceAll;;;Argument[-1];ReturnValue;taint", | ||
"java.lang;String;false;replaceFirst;;;Argument[1];ReturnValue;taint", | ||
"java.lang;String;false;replaceFirst;;;Argument[-1];ReturnValue;taint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ones have non-CSV models that should be cleaned up concurrently (search for "replaceAll"
for example)
Extends the dataflow String models to cover more methods.
I am not that familiar with the CSV model yet, and also don't know how to use the tools to automatically generate test cases.
Additionally there are a few open questions (see comments).
Any feedback is appreciated, but also feel free to just pick this changes up and complete them, if that would be easiest for you.