Losing lines when using the taint tracking with aliasing on C# #11041

asoifer · 2022-10-29T17:19:17Z

asoifer
Oct 29, 2022

Hi,

I'm still trying to build a kind of static slicer using CodeQL.
Despite the control dependencies (that I'm solving by adding some predicates in isAdditionalTaintStep) I'm losing lines in several contexts, especially when working with structures (trees, hash sets, lists, or whatever).

For example, I synthesized this case which is using a list.

public static void TestP1(int p)
{
	var list = new List<ExampleClass>();
	var elem = new ExampleClass();
	
	list.Add(elem);
	elem.ExampleProperty = p;

	var x = list[0].ExampleProperty;
	var y = elem.ExampleProperty;
	var a = x; // A
	var b = y; // B
}

public static void TestP2(int p)
{
	var list = new List<ExampleClass>();
	var elem = new ExampleClass();

	elem.ExampleProperty = p;
	list.Add(elem);

	var x = list[0].ExampleProperty;
	var y = elem.ExampleProperty;
	var a = x; // C
	var b = y; // D
}

As you can appreciate, the only difference between TestP1 and TestP2 is the order of the statements list add and the assignment of p.

So, I'll run this simple query and I'll show the results.

import semmle.code.csharp.dataflow.DataFlow

class GlobalTestConfiguration extends TaintTracking::Configuration {
  GlobalTestConfiguration() { this = "GlobalTestConfiguration" }

  override predicate isSink(DataFlow::Node sink) {
    1 = 1
  }

  override predicate isSource(DataFlow::Node source) {
    1 = 1
  }
}

from DataFlow::Node source, DataFlow::Node sink, GlobalTestConfiguration config
where 
      sink.getLocation().getFile().getBaseName() = "Program.cs"
  and sink.getLocation().getStartLine() = 1 // --> replace the line number with A, B, C, and D
  and config.hasFlow(source, sink)
select source.getLocation().getFile().getBaseName() + ": " + source.getLocation().getStartLine().toString()

Results for B, C, and D:
1: Use of parameter "p"
2: Assignment to "ExampleProperty"
3: Previous assignment (to x or y respectively)
4: Last assignment (of a or b)

The problem happens with A, where I'm having this result:
1: Previous assignment (to x or y respectively)
2: Last assignment (of a or b)
This is not recognizing the last assignment of ExampleProperty.
The reason? The assignment is after we add the element to the list, and then we're getting the list.

Since this is not having a memory model behind for recognizing that we're having aliasing between the first element of the list (list[0]) and elem, this is not recognizing that assignment as a sink.

If I'm not wrong (and I'm not having a huge mistake in the query), this seems unsound, even for a taint tracker.

Even more, I found many lines that should be in the tracking in different situations with different structures.
I used the taint tracker for getting the result values in the benchmark Olden (it only uses simple structures without accessing complex libraries).

Another example, an easy one, just try to use the taint tracker (for C#) in this case:

var i = 1;
i++;
var j = i;

The taint tracker is not getting the first line. This is cutting the analysis on the 2nd one.
Even adding this predicate (which includes the arithmetic operations)

(nodeA.asExpr().(ArithmeticOperation).getAnOperand() = nodeB.asExpr())

After analyzing this case, it seems that this is something related to the type of the nodes (like Ssa <> Expr) or something like that, I really don't know.

So, after my incredibly long post, if you are still here my question is: am I doing something wrong in the query?

Thank you so much to anyone who can give me some clues =)

smowton · 2022-10-31T09:27:45Z

smowton
Oct 31, 2022
Maintainer

Regarding increment operators, I note that https://github.com/github/codeql/blob/main/csharp/ql/lib/semmle/code/csharp/dataflow/internal/TaintTrackingPrivate.qll#L39 includes a taint step for unary logical but not arithmetic operations. Have you tried adding an additional step from UnaryArithmeticOperation's operand to the UnaryArithmeticOperation itself?.

Regarding alias analysis, generally speaking CodeQL dataflow analyses don't use a general-purpose (conservative) alias analysis, trading off false positives caused by spurious aliasing, vs. false negatives caused by true aliasing that wasn't accounted for, vs. cost of analysis. A static slicer on the other hand requires a conservative alias analysis for the slicing to be sound, which suggests you would need to write a significantly different query to an ordinary taint-tracking analysis.

A more complete AA has been attempted in CodeQL before: for example, the C/C++ analysis has https://codeql.github.com/codeql-standard-libraries/cpp/semmle/code/cpp/pointsto/PointsTo.qll/module.PointsTo.html, although I don't know whether that analysis is fully conservative or whether it is merely more conservative than DataFlow and TaintTracking's simplified view of aliasing. I also note that the documentation for that library says WARNING: This library may perform poorly on very large projects. Consider using another library such as semmle.code.cpp.dataflow.DataFlow instead.

1 reply

asoifer Oct 31, 2022
Author

I see that the library for PointsTo is only for C/C++.
So, having the same in C# requires more effort than I was expecting.
Thank you so much for your quick answer!

garbervetsky · 2022-11-01T18:34:40Z

garbervetsky
Nov 1, 2022

Hi @smowton. Regarding the alias analysis. Do you know is there are some basic alias analysis for Java and/or C#? I see that there seem to have some call resolution. Do you use some sort of type tracking for solving the calls?

Thanks!

1 reply

smowton Nov 1, 2022
Maintainer

Yes. Rather than I try to describe it in detail and probably get it wrong, it might be easiest if you refer to https://github.com/github/codeql/blob/main/java/ql/lib/semmle/code/java/dispatch/DispatchFlow.qll and ask any questions arising from that reading :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Losing lines when using the taint tracking with aliasing on C# #11041

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Losing lines when using the taint tracking with aliasing on C# #11041

asoifer Oct 29, 2022

Replies: 2 comments · 2 replies

smowton Oct 31, 2022 Maintainer

asoifer Oct 31, 2022 Author

garbervetsky Nov 1, 2022

smowton Nov 1, 2022 Maintainer

asoifer
Oct 29, 2022

Replies: 2 comments 2 replies

smowton
Oct 31, 2022
Maintainer

asoifer Oct 31, 2022
Author

garbervetsky
Nov 1, 2022

smowton Nov 1, 2022
Maintainer