eurecom-s3/symcc

Potentially missing test cases generated from a simple loop

thuanpv opened this issue · 4 comments

Hello SymCC maintainers,

I have been trying to test how SymCC works on loops. To that end, I wrote a simple C program based on this example: https://github.com/AdaLogics/adacc/blob/master/examples/afl-sample.c. Following is my modified program (stored in example.c) and the function of interest is foo.

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

int foo(char *arr, int count) {
    int i = 0;

    for (i = 0; i < count; i++) {
      if (arr[i] == 'A' + i) return i;
    }

    if (*(int*)(arr) != 0x4d4f4f42) {
        return i;
    }
    return 0;
}

int main(int argc, char* argv[]) {
    // open file
    FILE *f = fopen(argv[1], "rb");

    // get file size
    fseek(f, 0, SEEK_END);
    long fsize = ftell(f);

    // read file contents
    fseek(f, 0, SEEK_SET);
    char *string = (char*)malloc(fsize + 1);
    fread(string, 1, fsize, f);
    fclose(f);

    // call into target
    int retval = foo(string, fsize);

    free(string);
    return retval;
}

I compiled this program and ran SymCC using the following commands

symcc example.c -o example
printf "xxxx" > seed
SYMCC_INPUT_FILE=seed ./example seed

I expected that SymCC would generate 5 test cases ("Axxx", "xBxx", "xxCx", "xxxD", and "BOOM"). However, somehow SymCC only generated 4 test cases, the string "xxxD" was missing.

It looked strange to me so I thought something had gone wrong. I modified the program and manually unrolled the loop 4 times; the modified foo function is listed below

int foo(char *arr, int count) {
    int i = 0;
    if (i < count) {
      if (arr[i] == 'A' + i) return i;
      i++;
    }

    if (i < count) {
      if (arr[i] == 'A' + i) return i;
      i++;
    }

    if (i < count) {
      if (arr[i] == 'A' + i) return i;
      i++;
    }

    if (i < count) {
      if (arr[i] == 'A' + i) return i;
      i++;
    }

    if (*(int*)(arr) != 0x4d4f4f42) {
        return i;
    }
    return 0;
}

Interestingly, when I ran SymCC with the new program, which was saved into example_loop_unrolled.c, SymCC generated 5 test cases as expected. The commands are similar to the above ones

symcc example_loop_unrolled.c -o example_loop_unrolled
SYMCC_INPUT_FILE=seed ./example_loop_unrolled seed

So I suspect that there are some issues in the loop-handling code of SymCC. Could you please confirm?

Thanks,

Thuan

If you are running SymCC (or SymQEMU) with the QSYM backend, then QSYM may be pruning the iterations of the loop. This would also explain why the unrolled loop does not show this behaviour: since it will (presumably) generate multiple branches (instead of one in a loop), each branch will be considered new by QSYM. See here.

Thanks @julihoh for your explanation. It makes sense to me now. It seems the QSYM backend applies some heuristics like the "loop bucketing" idea of AFL -- AFL only retains branches having specific "hit counts" like 1, 2, 3, 4, 8, 16, 32, ....

Exactly. Always happy to help :)

Just a quick update. I recently re-ran the same example using the simple backend instead of QSYM and it works as expected: the Z3-based backend generates exactly 5 test cases.