eurecom-s3/symcc

Path not found

vanhauser-thc opened this issue · 4 comments

This example file:

#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
int main(int argc, char *argv[]) {

  char    buf[1024];
  ssize_t i;
  if ((i = read(0, buf, sizeof(buf) - 1)) < 24) return 0;
  buf[i] = 0;
  if (buf[0] != 'A') return 0;
  if (buf[1] != 'B') return 0;
  if (buf[2] != 'C') return 0;
  if (buf[3] != 'D') return 0;
  if (memcmp(buf + 4, "1234", 4) || memcmp(buf + 8, "EFGH", 4)) return 0;
  if (strncmp(buf + 12, "IJKL", 4) == 0 && strcmp(buf + 16, "DEADBEEF") == 0)
    abort();
  return 0;

}

the abort() can be triggered with:
echo -en 'ABCD1234EFGHIJKLDEADBEEF\0'|./test

when I compile the test.c with symcc, and feed the most advanced input it finds, it can not get past "ABCD1234EFGHAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", so it cannot solve the last strcmp with DEADBEEF\0 - I guess this is a bug? or is this too complex to solve?

@vanhauser-thc this is most likely because our symbolic understanding of libc is incomplete. So when you call strncmp, SymCC can't trace the computations that happen in the function. There are several ways to address the issue:

  1. We can add strncmp and strcmp to the collection of wrapped libc functions and register the wrapper in the compiler. This is quick and easy, but it doesn't scale if your target uses many libc functions on symbolic data.
  2. Alternatively, one can build a fully instrumented libc (like, for example, KLEE does with uclibc). I refrained from doing this so far because it seems like a big burden on the user to set up their own instrumented libc; maybe we could automate the process to make it less tedious.
  3. Something I've done in the past is to cherry-pick individual libc functions from a libc implementation (I used musl) and include them in the target program. For example, the implementation of strncmp is self-contained, so you can just add it to your build (e.g., symcc -o test test.c /path/to/musl/src/string/strncmp.c).

Suggestions how to make such situations easier to deal with are highly welcome :)

I think string compare functions make sense to add (so solution 1) as these are a limited number of common functions. 2 sounds like too much overhead and 3) should only be needed to be done for very specific cases (e.g. a third party library that is binary only),

KLEE's libc may also be a good starting point for 1 and/or 3: https://github.com/klee/klee/tree/master/runtime/klee-libc

Has anyone already tried 2)? If all it takes to get full support of libc functions is to compile uclibc with SymCC from scratch then I don't mind the extra work. Are there any additional steps required to set up the instrumented libc that I'm missing?