Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question - Variable initialization #7827

Open
QWERTYz12 opened this issue Feb 3, 2022 · 12 comments
Open

Question - Variable initialization #7827

QWERTYz12 opened this issue Feb 3, 2022 · 12 comments
Labels
question Further information is requested

Comments

@QWERTYz12
Copy link

QWERTYz12 commented Feb 3, 2022

Variable initialization (cpp)

Hi, I'm having problems trying to catch the first assignment of a variable in a function's argument.

For instance, catching the struct->elem variable from recvfrom arg 2, and finding its first assignment whereby it is equated to ptr. Ultimately, I want to catch all ptr variables that are used in the recvfrom function.

struct->elem  = ptr
recvfrom(df, struct->elem...)

Thanks for any help!

@QWERTYz12 QWERTYz12 added the question Further information is requested label Feb 3, 2022
@aibaars
Copy link
Contributor

aibaars commented Feb 3, 2022

Thanks for your question. If I understand correctly you are trying to find where the pointer-values used as the second argument of recvfrom come from. Have you tried using the DataFlow library?

@QWERTYz12
Copy link
Author

Thanks for the reply!

Yup I am currently using global taint tracking to catch the above code as isSource to memcpy isSink. But the second argument in recvfrom does not flow straight into memcpy as it has been dereferenced in the later part before the memcpy function. Hence, I am having difficulties directing the data flow from recvfrom argument 2 to memcpy argument 3.

To solve this, I am trying to catch the recvfrom argument 2 variable and find its assignment. I would then try to use isAdditionalTaintStep to find instances where the ptr variable has been dereferenced and lastly, redirect the flow back to memcpy argument 3. Is this the right way to find the bug? If so, how do I find the variable assignment of the recvfrom argument 2 variable in the isSource predicate?

The current flow of the bug:

ptr = mempool_alloc(...)
struct -> elem = ptr
recvfrom(df, struct->elem...)
len = *ptr - 2
memcpy(struct2->elem, ptr, len)

@MathiasVP
Copy link
Contributor

Hi @QWERTYz12,

It sounds like you want flow from the value that’s pointed to by struct->elem after the call to recvfrom, and into the len argument supplied to memcpy. Is that correct?

This isn't something our global taint tracking libraries support super well. The issue is that we see a "write" to struct->elem followed by a read of *ptr, and we don't look for possible aliasing relations that has been established before the source (i.e., before the call to recvfrom in this case). So we miss this flow.

One hack that you could do is to supply your taint configuration with an additional taint step that transfers taint between pointers which we know will have the same value at runtime. This can be done using the semmle.code.cpp.valuenumbering.GlobalValueNumbering library like this:

/**
 * @kind path-problem
 */

import semmle.code.cpp.dataflow.TaintTracking
import semmle.code.cpp.valuenumbering.GlobalValueNumbering
import DataFlow::PathGraph

class Conf extends TaintTracking::Configuration {
  Conf() { this = "Conf" }

  override predicate isSource(DataFlow::Node source) {
    // The source is a value that's coming out of `recvfrom`'s second argument.
    source.asDefiningArgument() =
      any(Call call | call.getTarget().hasName("recvfrom")).getArgument(1)
  }

  override predicate isSink(DataFlow::Node sink) {
    // The sink is a value that goes into `memcpy`'s third argument.
    sink.asExpr() = any(Call call | call.getTarget().hasName("memcpy")).getArgument(2)
  }

  override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
    // Transfer flow from any value that's coming out of a function call, and into expressions
    // which we know will have the same value at that point in the execution.
    globalValueNumber(node1.asDefiningArgument()).getAnExpr() = node2.asExpr()
  }
}

from Conf conf, DataFlow::PathNode source, DataFlow::PathNode sink
where conf.hasFlowPath(source, sink)
select sink.getNode(), source, sink, ""

This solution might not be very efficient on all projects, and it does potentially provide some false flow. But at least it gets the flow you want in this case!

In the future, I hope we'll have a better solution to this issue so that you don't need to specify this additional taint step.

I hope this helps :)

@QWERTYz12
Copy link
Author

Hi @MathiasVP,

Thanks for the explanation and query! Appreciate it :)

Yup I want to find the path flow from the value pointed to by struct->elem after the call to recvfrom and into len = * ptr -2, followed by the memcpy function argument 3.

I tried to run your query on my codebase but wasn't able to catch the bug. I realised that both ptr and struct->elem have different values at runtime (eg: there's a line of ptr += 16 before len = *ptr -2). Is that why the query was not able to catch the bug since ptr and struct->elem do not have the same value anymore? If so, how do I solve this issue?

Flow of the bug:

ptr = mempool_alloc(...);
struct -> elem = ptr;
recvfrom(df, struct->elem...);
...
struct -> elem2 = *ptr; ptr++;
struct -> elem3  = *ptr; ptr++;
struct -> elem4 = *ptr; ptr += 2;
ptr += 16;
...
id = *ptr; ptr++;
len = *ptr - 2;
memcpy(struct2->elem, ptr, len);

@QWERTYz12
Copy link
Author

Hi, any updates on this question?

@MathiasVP
Copy link
Contributor

Hi @QWERTYz12,

Sorry. I missed your update on the issue. The query I created above does find the flow from struct->elem on line 15 to len on line 24 in this example:

char* mempool_alloc();
int recvfrom(int, char* buf, int len);
void* memcpy(void*, const void*, unsigned long);

typedef struct {
  char* elem;
  char elem2;
  char elem3;
  char elem4;
} S;

void test(int df, S* s) {
  char* ptr = mempool_alloc();
  s->elem = ptr;
  recvfrom(df, s->elem, 10);
  
  s->elem2 = *ptr; ptr++;
  s->elem3  = *ptr; ptr++;
  s->elem4 = *ptr; ptr += 2;
  ptr += 16;
  
  char id = *ptr; ptr++;
  int len = *ptr - 2;
  memcpy(s->elem, ptr, len);
}

Can you show a complete example (that compiles) where we don't find the flow.

@QWERTYz12
Copy link
Author

Hi @MathiasVP,

Sure no problem.

The project that I'm working on is xebd/accel-ppp where the query don't find the flow. The code can be found in /opt/src/accel-pppd/radius/packet.c from line 142 n = recvfrom(fd, pack->buf, REQ_LENGTH_MAX, 0, addr, &addr_len); to line 261 memcpy(&attr->val.integer, ptr, len);

Thanks for the help!

@QWERTYz12
Copy link
Author

Hi! Any updates on this question?

@MathiasVP
Copy link
Contributor

MathiasVP commented Feb 21, 2022

Hi @QWERTYz12,

I'm trying to figure out why we're not finding the flow. It's a bit more complicated than I first anticipated based on the snippet with the flow of the bug :) I hope I'll have an update on it soon.

Part of the issue is that ptr and struct->elem might have different values at runtime, so I'm trying to figure out what the best approach to overcoming this is.

@QWERTYz12
Copy link
Author

Hey @MathiasVP! Are there any updates on this issue?

@MathiasVP
Copy link
Contributor

No update yet. I'll be sure to post an update in here once I have something :)

@QWERTYz12
Copy link
Author

Sure, thanks alot @MathiasVP!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants