-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] [Question] How to detect taint on elements in a collection #18098
Comments
I'm guessing you might have edited your code snippet leaving out some information (the However, trying out this example, it would indeed seem we don't currently track taint through vectors. I will ask my colleagues if it's really the case. In the meantime, this seems to cover your simple example, by defining additional flow steps: import cpp
import semmle.code.cpp.dataflow.new.TaintTracking
module TaintConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asExpr().(VariableAccess).getTarget().getName() = "sensitive_data"
}
predicate isAdditionalFlowStep(DataFlow::Node lhs, DataFlow::Node rhs) {
exists(ConstructorCall c | c.getTarget().getName() = ["vector", "initializer_list"]
and c = rhs.asExpr() and c.getAnArgument() = lhs.asExpr())
}
predicate isSink(DataFlow::Node sink) {
exists(Call c | c.getTarget().getName() = "potential_leak" and
c.getArgument(0) = sink.asExpr())
}
}
module Flow = TaintTracking::Global<TaintConfig>;
from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, "flow to $@", sink, sink.toString() notice however that modelling all ways in which an element can be inserted into a vector might be tricky ( |
|
Hi @redsun82, Thank you very much for looking into this. I did know that you modeled collections but I didn't realize they overtainted to the container itself. Nice. Sorry for the buggy code, I was trying to fix the indentation after copying it and I deleted too much by accident. So it seems I just ran into one of the unsupported methods 😅. Anyway your workaround appears to work, thanks a lot. Since I have you I have two related questions:
|
Hi @JustusAdam
No worries!
Well, not really: everything that models do can be done with explicit and careful coding of the About
Well, again, you could do that with additional taint steps. For example this (untested) snippet: import semmle.code.cpp.ir.dataflow.internal.DataFlowPrivate
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
// anything that generates a store step into a vector type is now also an additional flow step that taints the whole vector
storeStep(node1, _, node2) and
node2.getType().hasQualifiedName("std", "", "vector")
} but in general we wouldn't recommend it.
How did Just for clarification, the fact that As for |
So this is kind of what I meant by "taint the whole object". I didn't know this is what happened already internally. But this is only for elements that are stored specially via the
So I hand rolled this vector implementation with a couple of test cases (all below) and ran the taint analysis. The long and short of it is that using indexing worked, taint inflicted via template <typename T>
class vec_iter
{
T *__p;
public:
vec_iter(T *p) : __p(p) {}
bool operator!=(vec_iter<T> &other)
{
return __p != other.__p;
}
vec_iter<T> &operator++()
{
__p++;
return *this;
}
T &operator*()
{
return *__p;
}
};
template <typename T>
class vec
{
T *__start;
T *__end;
uint64_t cap;
public:
vec()
{
__start = 0;
__end = __start;
cap = 0;
};
void ensure_cap(uint64_t additional)
{
auto current = (uint64_t)(__end - __start);
auto target = current + additional;
if (target > cap)
{
auto new_cap = cap * 2;
if (new_cap == 0)
{
new_cap = 4;
}
while (new_cap < target)
new_cap *= 2;
auto old_start = __start;
__start = new T[new_cap];
__end = __start + current;
memcpy(__start, old_start, cap * sizeof(T));
free(old_start);
cap = new_cap;
}
}
void push_back(const T &elem)
{
ensure_cap(1);
*__end = elem;
__end++;
}
vec_iter<T> begin()
{
return vec_iter(__start);
}
vec_iter<T> end()
{
return vec_iter(__end);
}
T &operator[](uint64_t n)
{
return *(__start + n);
}
};
int main(int argv, char **argc)
{
int result = 0;
// Indexing, this works
vec<int> v1 = vec<int>();
v1[0] = source();
result += target(v1[0]);
// Push_back, this doesn't work
vec<int> v2;
v2.push_back(source());
auto elem = v2[0];
result += target(elem);
return result;
}
Query: import cpp
import semmle.code.cpp.dataflow.new.TaintTracking
module SourceSinkCallConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { isSourceCall(source.asExpr()) }
predicate isSink(DataFlow::Node sink) {
isTargetOperand(sink.asExpr()) or isTargetOperand(sink.asIndirectExpr())
}
}
predicate isTargetOperand(Expr o) {
exists(FunctionCall call | isTargetFunction(call.getTarget()) and call.getArgument(0) = o)
}
predicate isSourceCall(Expr e) { e.(FunctionCall).getTarget().getName() = "source" }
module SourceSinkCallTaint = TaintTracking::Global<SourceSinkCallConfig>;
from DataFlow::Node source, DataFlow::Node sink, int source_line, int sink_line
where
SourceSinkCallTaint::flow(source, sink) and
source_line = source.getLocation().getStartLine() and
sink_line = sink.getLocation().getStartLine()
select source, source_line, sink, sink_line |
hi @JustusAdam I see. It's probably the case that taint analysis cannot see through the logic of I couldn't help but notice a couple of oddities (though I understand yours is just demo code):
|
Hye, thanks for the explanation. I don't use the manual memory management in C++ too much and I also just created this example for understanding CodeQL, not actually to use it and I only tested that pushing into the vector works 😅. With proper Anyway I don't think the problem is |
I haven't fully read the thread so if this is already mentioned you can skip it, but when working with tainted elements of a collection or fields of a struct, you typically want to implement allowImplicitRead to allow our analysis to assume arbitrary read steps at the sink when the collection or structure is passed to the sink (the collection/struct might not be tainted, but a member could be). Here is an example implementation
|
I am trying to detect the flow into
potential_leak
in the following, simplified code. This is just the minimal example, the vector can be constructed any way, e.g. with a series ifpush_back
or via iterator etc and I’m trying to find a way to reliably detect taint on any elements at the sink location. Also assume that I do not have access to the source code ofpotential_leak
and thus could detect the taint when the elements are accessed.My simplified query is
However this does not detect the flow. Is there some way to select the elements inside of
v
as sinks for this query?CodeQL version: 2.19.3
The text was updated successfully, but these errors were encountered: