Extend SymCC to perform forking exploration (not just concolic execution)

Question

Extend SymCC to perform forking exploration (not just concolic execution)

pgarba opened this issue 4 years ago · 5 comments

Hi,

I'm testing with a simple test case and it seems to fail. Any idea what's going wrong ?

I'm compiling with -O0 and the expected result should be 0x10. It works with >=-O2 as the
for loop will be simplified by LLVM.

Thanks,
Peter

#include <stdint.h>
#include <stdio.h>
#include <unistd.h>

uint64_t g_value = 0x10;

int main(int argc, char *argv[]) {
  uint64_t x;
  if (read(STDIN_FILENO, &x, sizeof(x)) != sizeof(x)) {
    printf("Failed to read x\n");
    return -1;
  }

  uint64_t y=0;
  for (int i=0;i<x;i++) {
    ++y;
  }

  printf("%s\n", (y == g_value) ? "yes" : "no");

  return 0;
}

Answer 1 · 2020-08-17T08:38:16.000Z

Hi @pgarba,

The behavior in your test case unfortunately depends on the input you pass to the program: if it's less than 0x10, it's very unlikely that you get 0x10 immediately (although you will eventually if you keep feeding the new test cases back to SymCC), while higher inputs are likely to give you 0x10 on the first attempt.

The reason is that, when the compiler doesn't simplify the for loop, new test cases are generated by the check i < x (because this is where the symbolic input x enters into a branch condition). Now, with a small x, say 0x00, the only time this check happens is when evaluating 0x00 < x to false - the solver will generate some larger x which indeed flips the outcome of the conditional, but it's very unlikely that you get exactly 0x10 (simply because the solver doesn't know that it's an important value at this point in the program). Now let's assume you run with a large input, say 0x20. Then the check happens in each iteration of the loop, one of them being i = 0x10 and thus evaluating 0x10 < x to true. At this point, the solver can flip the conditional by generating a test case x := 0x10 (which it does in my local tests). There is one more complication: The QSYM backend has a back-off mechanism that stops generating new test cases for the same conditional after some number of repetitions - if you're unlucky, it makes the analysis skip the comparison 0x10 < x... (But again, feeding SymCC's outputs back into the program will eventually get you there.)

In the optimized case, in contrast, the program probably sets y := x directly and then compares y == g_value, so it is easy for the solver to produce the test case x := g_value.

As far as I know, this is a problem in all symbolic executors. The best countermeasure, I suppose, is to have the compiler optimize the program as much as possible - this also helps to reduce the cost of solving, e.g., when the compiler replaces multiplications with shifts.

Hope this makes sense :)

Cheers,
Sebastian

Answer 2 · 2020-08-17T19:51:56.000Z

Thanks for your answer. After reading the SymCC paper it's a little bit more clear to me why this test case is failing.

What I still don't understand is why the true path for (y == g_value) will be not explored. Is it because you assume that y has become concrete as no test case has been generated that proves that its not ?

For this test cases I think KLEE can create a valid test cases that reaches the true path. It just takes longer to prove that one.

Thanks,
Peter

Answer 3 · 2020-08-18T08:16:39.000Z

What I still don't understand is why the true path for (y == g_value) will be not explored. Is it because you assume that y has become concrete as no test case has been generated that proves that its not ?

When the program compares y == g_value, both variables are concrete: g_value is a constant, and y is initialized to the constant 0 and possibly incremented by the constant 1. Its value is, of course, dependent on control flow, which in turn depends on symbolic input, but the executor lacks that abstract view.

For this test cases I think KLEE can create a valid test cases that reaches the true path. It just takes longer to prove that one.

KLEE lacks an abstract view of loops just like SymCC, but it explores all possible paths where SymCC only gives you the "closest" new paths. Effectively, KLEE solves the test by trying all possible numbers of loop iterations (which is why it can take very long). You get the same result by running SymCC in a loop that feeds the newly generated test cases back into the program.

Cheers,
Sebastian

Answer 4 · 2020-08-18T09:03:46.000Z

Would be it possible to have an option in SymCC to enforce exploration of all possible paths ? I'm asking as we would like to use SymCC in our tool SATURN to explorer all possible paths in a partial control flow graph that is lifted from an input binary.

Thanks,
Peter

Answer 5 · 2020-08-18T09:15:35.000Z

Hi,

Would be it possible to have an option in SymCC to enforce exploration of all possible paths ?

Yes this would be possible, and nice to have. I imagine this could rely on doing a fork() at each branch instead of only following the concrete one, plus a way to schedule processes. Using the copy on write memory would allow to limit the performance impact. However, I imagine there will be weirdness with sockets and file handles and I'm not sure how best to handle this.

Best,
Aurélien