Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Extend String dataflow models #7019

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Marcono1234
Copy link
Contributor

Extends the dataflow String models to cover more methods.

I am not that familiar with the CSV model yet, and also don't know how to use the tools to automatically generate test cases.
Additionally there are a few open questions (see comments).

Any feedback is appreciated, but also feel free to just pick this changes up and complete them, if that would be easiest for you.

@github-actions github-actions bot added the Java label Nov 1, 2021
@Marcono1234 Marcono1234 force-pushed the marcono1234/extend-String-models branch from 55288d1 to 1fe4204 Compare November 1, 2021 22:10
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2021

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,519,30,13,,,7,,,10
+    Java Standard Library,``java.*``,3,543,30,13,,,7,,,10
-    Totals,,175,5341,408,13,6,10,107,33,1,66
+    Totals,,175,5365,408,13,6,10,107,33,1,66
  • Changes to framework-coverage-java.csv:
- java.io,3,,27,,3,,,,,,,,,,,,,,,,,,,26,1
+ java.io,3,,29,,3,,,,,,,,,,,,,,,,,,,28,1
- java.lang,,,51,,,,,,,,,,,,,,,,,,,,,41,10
+ java.lang,,,72,,,,,,,,,,,,,,,,,,,,,52,20
- java.util,,,429,,,,,,,,,,,,,,,,,,,,,15,414
+ java.util,,,430,,,,,,,,,,,,,,,,,,,,,15,415

@@ -11,19 +11,28 @@ private class StringSummaryCsv extends SummaryModelCsv {
"java.lang;String;false;concat;(String);;Argument[0];ReturnValue;taint",
"java.lang;String;false;concat;(String);;Argument[-1];ReturnValue;taint",
"java.lang;String;false;copyValueOf;;;Argument[0];ReturnValue;taint",
"java.lang;String;false;endsWith;;;Argument[-1];ReturnValue;taint",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed this because no other boolean returning methods are modelled.

"java.lang;String;false;indent;;;Argument[-1];ReturnValue;taint",
"java.lang;String;false;intern;;;Argument[-1];ReturnValue;taint",
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint",
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint", // TODO: ArrayElement of Argument?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this here and maybe also in other cases use ArrayElement of Argument instead? What is the difference between tracking taint form an array (or varargs?) compared to tracking taint from the elements?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because TaintTracking::Configuration specifies an allowImplicitRead method that allows implicit reads from an array, collection or map-value (i.e., we can treat y = propagateTaint(x) or sinkTaint(x) as if they were y = propagateTaint(x[n]) or sinkTaint(x.get(n))), in practice it doesn't make much difference for typical taint-tracking applications. In addition it specifies an isAdditionalFlowStep implementation that provides the opposite blurring of taint, saying that reading from a tainted map or tainted array yields tainted content.

In summary, for taint-tracking purposes Field and SyntheticField content continue to be faithfully distinguished, but array, collection and map content are blurred in both directions, with a tainted whole implying a tainted element and vice versa. You can specify them for clarity in the CSV description, but it won't make much (any?) difference to your results.

Comment on lines +54 to +55
// TODO: Should `append` and `write` be modelled for Appendable and Writer instead?
// Could then remove some of the modelled `append` method here and for StringBuilder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Appendable and Writer be modelled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a positive step to me

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2021

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,519,30,13,,,7,,,10
+    Java Standard Library,``java.*``,3,542,30,13,,,7,,,10
-    Totals,,175,5341,408,13,6,10,107,33,1,66
+    Totals,,175,5364,408,13,6,10,107,33,1,66
  • Changes to framework-coverage-java.csv:
- java.io,3,,27,,3,,,,,,,,,,,,,,,,,,,26,1
+ java.io,3,,29,,3,,,,,,,,,,,,,,,,,,,28,1
- java.lang,,,51,,,,,,,,,,,,,,,,,,,,,41,10
+ java.lang,,,71,,,,,,,,,,,,,,,,,,,,,51,20
- java.util,,,429,,,,,,,,,,,,,,,,,,,,,15,414
+ java.util,,,430,,,,,,,,,,,,,,,,,,,,,15,415

Comment on lines +78 to +79
"java.lang;StringBuilder;true;StringBuilder;(CharSequence);;Argument[0];Argument[-1];taint",
"java.lang;StringBuilder;true;StringBuilder;(String);;Argument[0];Argument[-1];taint",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have specified the parameter types here to avoid tracking StringBuilder(int) where the argument only represents the capacitiy.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's few enough new methods here that it's probably best to just write some tests by hand for these, particularly since there are lambda tests involved and AFAIK the auto test generation stuff can't do that yet.

Suggest writing the tests then running them without this in place, as a quick way to find string methods that have non-CSV models that can be cleaned up.

"java.lang;String;false;indent;;;Argument[-1];ReturnValue;taint",
"java.lang;String;false;intern;;;Argument[-1];ReturnValue;taint",
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint",
"java.lang;String;false;join;;;Argument[0..1];ReturnValue;taint", // TODO: ArrayElement of Argument?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because TaintTracking::Configuration specifies an allowImplicitRead method that allows implicit reads from an array, collection or map-value (i.e., we can treat y = propagateTaint(x) or sinkTaint(x) as if they were y = propagateTaint(x[n]) or sinkTaint(x.get(n))), in practice it doesn't make much difference for typical taint-tracking applications. In addition it specifies an isAdditionalFlowStep implementation that provides the opposite blurring of taint, saying that reading from a tainted map or tainted array yields tainted content.

In summary, for taint-tracking purposes Field and SyntheticField content continue to be faithfully distinguished, but array, collection and map content are blurred in both directions, with a tainted whole implying a tainted element and vice versa. You can specify them for clarity in the CSV description, but it won't make much (any?) difference to your results.

Comment on lines +54 to +55
// TODO: Should `append` and `write` be modelled for Appendable and Writer instead?
// Could then remove some of the modelled `append` method here and for StringBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a positive step to me

Comment on lines +29 to +34
"java.lang;String;false;replace;;;Argument[1];ReturnValue;taint",
"java.lang;String;false;replace;;;Argument[-1];ReturnValue;taint",
"java.lang;String;false;replaceAll;;;Argument[1];ReturnValue;taint",
"java.lang;String;false;replaceAll;;;Argument[-1];ReturnValue;taint",
"java.lang;String;false;replaceFirst;;;Argument[1];ReturnValue;taint",
"java.lang;String;false;replaceFirst;;;Argument[-1];ReturnValue;taint",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ones have non-CSV models that should be cleaned up concurrently (search for "replaceAll" for example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants