Potentially missing test cases generated from a simple loop
thuanpv opened this issue · 4 comments
Hello SymCC maintainers,
I have been trying to test how SymCC works on loops. To that end, I wrote a simple C program based on this example: https://github.com/AdaLogics/adacc/blob/master/examples/afl-sample.c. Following is my modified program (stored in example.c) and the function of interest is foo.
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
int foo(char *arr, int count) {
int i = 0;
for (i = 0; i < count; i++) {
if (arr[i] == 'A' + i) return i;
}
if (*(int*)(arr) != 0x4d4f4f42) {
return i;
}
return 0;
}
int main(int argc, char* argv[]) {
// open file
FILE *f = fopen(argv[1], "rb");
// get file size
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
// read file contents
fseek(f, 0, SEEK_SET);
char *string = (char*)malloc(fsize + 1);
fread(string, 1, fsize, f);
fclose(f);
// call into target
int retval = foo(string, fsize);
free(string);
return retval;
}
I compiled this program and ran SymCC using the following commands
symcc example.c -o example
printf "xxxx" > seed
SYMCC_INPUT_FILE=seed ./example seed
I expected that SymCC would generate 5 test cases ("Axxx", "xBxx", "xxCx", "xxxD", and "BOOM"). However, somehow SymCC only generated 4 test cases, the string "xxxD" was missing.
It looked strange to me so I thought something had gone wrong. I modified the program and manually unrolled the loop 4 times; the modified foo function is listed below
int foo(char *arr, int count) {
int i = 0;
if (i < count) {
if (arr[i] == 'A' + i) return i;
i++;
}
if (i < count) {
if (arr[i] == 'A' + i) return i;
i++;
}
if (i < count) {
if (arr[i] == 'A' + i) return i;
i++;
}
if (i < count) {
if (arr[i] == 'A' + i) return i;
i++;
}
if (*(int*)(arr) != 0x4d4f4f42) {
return i;
}
return 0;
}
Interestingly, when I ran SymCC with the new program, which was saved into example_loop_unrolled.c, SymCC generated 5 test cases as expected. The commands are similar to the above ones
symcc example_loop_unrolled.c -o example_loop_unrolled
SYMCC_INPUT_FILE=seed ./example_loop_unrolled seed
So I suspect that there are some issues in the loop-handling code of SymCC. Could you please confirm?
Thanks,
Thuan
If you are running SymCC (or SymQEMU) with the QSYM backend, then QSYM may be pruning the iterations of the loop. This would also explain why the unrolled loop does not show this behaviour: since it will (presumably) generate multiple branches (instead of one in a loop), each branch will be considered new by QSYM. See here.
Thanks @julihoh for your explanation. It makes sense to me now. It seems the QSYM backend applies some heuristics like the "loop bucketing" idea of AFL -- AFL only retains branches having specific "hit counts" like 1, 2, 3, 4, 8, 16, 32, ....
Exactly. Always happy to help :)
Just a quick update. I recently re-ran the same example using the simple backend instead of QSYM and it works as expected: the Z3-based backend generates exactly 5 test cases.