google/AFL

llvm instrumentation duplication

wideglide opened this issue · 4 comments

It appears that #19 has some unintended results that produce multiple "afl_maybe_log" updates in a single basic block.

As an example, this a basic block from sqlite3, function sqlite3VMPrintf, compiled with afl-clang-fast (clang-9). At the beginning of this basic block there are three updates to the shared map where there should only be one. I believe this is because the LLVM pass is run before optimizations. As AFL is already constrained by the MAP_SIZE and sensitive to edge collisions, this seems like a bad result. Additionally, having extra instrumentation slows down execution speed.

Is there an alternative solution?

image

This is not ideal indeed. @ddcc, what do you think?

ddcc commented

Can you provide the IR and the corresponding source for that basic block? I'm guessing that some function calls were probably inlined, which could result in multiple consecutive map updates (I forget the details of how AFL works)?

I don't know if I'll have time to look into this, but it's an issue I've run into in the past with some other instrumentation. Our solution there was a two-pass approach, one very early before optimization to insert instrumentation, and another very late after optimization to remove excess instrumentation. But perhaps you get could away with a single-pass solution here, by running late after optimization like before, but with a custom transform to convert select instructions back to explicit branches? I don't know if there'd be other issues though.

yes, I think your are correct that this is a result of inlined function calls. AFL's instrumentation should insert code on the beginning of every basic block to update the shared map with the edge that was just traversed (roughly afl_area[cur_loc ^ prev_loc]++). This source/ IR is ugly because it was injected with fake bugs via LAVA. Here's the source for the whole function:

static char *sqlite3VMPrintf(sqlite3 *db, const char *zFormat, va_list ap){
  char *z;
  char zBase[70];
  StrAccum acc;

  sqlite3StrAccumInit(&acc, db, zBase, ({((0x436d4767 == lava_get(50)) && fprintf(stderr, "\nLAVALOG: %d: %s:%d\n", 13034137, "sqlite3-pre.c", 10557)), (sizeof(zBase) + (lava_get(50) * (0x436d4767 == lava_get(50))));}),
                      db->aLimit[0]);
  acc.printfFlags = 0x01;
  sqlite3_str_vappendf(&acc, ({((0x69576c78 == lava_get(51)) && fprintf(stderr, "\nLAVALOG: %d: %s:%d\n", 13044286, "sqlite3-pre.c", 10560)), (zFormat + (lava_get(51) * (0x69576c78 == lava_get(51))));}), ap);
  z = sqlite3StrAccumFinish(&acc);
  if( acc.accError==7 ){
    sqlite3OomFault(db);
  }
  return z;
}

The lava_get() function calls are all inlined in the final assembly, and in the comparison (0x69576c78 C = iWlx asm above = 1767337080 IR below). So of the three map updates in the assembly above, one is the entry to the block and one each for the two inlined function calls.

Here's the IR corresponding to the basic block above:

45:                                               ; preds = %41, %22
  %46 = phi i1 [ false, %22 ], [ %44, %41 ]
  %47 = zext i1 %46 to i32
  %48 = load i8*, i8** %5, align 8
  %49 = call i32 @lava_get(i32 51)
  %50 = call i32 @lava_get(i32 51)
  %51 = icmp eq i32 1767337080, %50
  %52 = zext i1 %51 to i32
  %53 = mul i32 %49, %52
  %54 = zext i32 %53 to i64
  %55 = getelementptr inbounds i8, i8* %48, i64 %54
  store i8* %55, i8** %11, align 8
  %56 = load i8*, i8** %11, align 8
  %57 = load %struct.__va_list_tag*, %struct.__va_list_tag** %6, align 8
  call void @sqlite3_str_vappendf(%struct.sqlite3_str* %9, i8* %56, %struct.__va_list_tag* %57)
  %58 = call i8* @sqlite3StrAccumFinish(%struct.sqlite3_str* %9)
  store i8* %58, i8** %7, align 8
  %59 = getelementptr inbounds %struct.sqlite3_str, %struct.sqlite3_str* %9, i32 0, i32 5
  %60 = load i8, i8* %59, align 4
  %61 = zext i8 %60 to i32
  %62 = icmp eq i32 %61, 7
  br i1 %62, label %63, label %65
ddcc commented

Hmm, you could try marking that function with the always_inline attribute, as a workaround. It might get inlined early enough that it occurs before instrumentation.