osx-afl-llvm: A C repository from bnagy

============================================
Fast LLVM-based instrumentation for afl-fuzz
============================================

  (See ../docs/README for the general instruction manual.)

1) Introduction
---------------

The code in this directory allows you to instrument programs for AFL using
true compiler-level instrumentation, instead of the more crude
assembly-level rewriting approach taken by afl-gcc and afl-clang. This has
several interesting properties:

  - The compiler can make many optimizations that are hard to pull off when
    manually inserting assembly. As a result, some slow, CPU-bound programs will
    run up to around 2x faster.

    The gains are less pronounced for fast binaries, where the speed is limited
    chiefly by the cost of creating new processes. In such cases, the gain will
    probably stay within 10%.

  - The instrumentation is CPU-independent. At least in principle, you should
    be able to rely on it to fuzz programs on non-x86 architectures (after
    building afl-fuzz with AFL_NOX86=1).

  - Because the features relies on the internals of LLVM, it is clang-specific
    and will *not* work with GCC.

Once this implementation is shown to be sufficiently robust, it will probably
replace afl-clang. For now, it can be built separately and co-exists with the
original code.

The idea and much of the implementation comes from Laszlo Szekeres.

2) How to use
-------------

In order to leverage this mechanism, you need to have clang installed on your
system. You should also make sure that the llvm-config tool is in your path;
if not, be sure to set LLVM_CONFIG in the environment beforehand.

Unfortunately, some systems come without llvm-config or the LLVM development
headers. In such a case, you may have to do a full install on your own, or
download pre-built binaries from:

  http://llvm.org/releases/download.html

To build the instrumentation, type 'make'. This will generate binaries called
afl-clang-fast and afl-clang-fast++ in the parent directory. Once this is done,
you can instrument third-party code in a way similar to the standard operating
mode of AFL, e.g.:

  CC=/path/to/afl/afl-clang-fast ./configure [...options...]
  make

...or:

  CXX=/path/to/afl/afl-clang-fast++ ./configure [...options...]
  make

The tool honors roughly the same environmental variables as afl-gcc (see
../docs/env_variables.txt). This includes AFL_INST_RATIO, AFL_USE_ASAN,
AFL_HARDEN, and AFL_DONT_OPTIMIZE.

Note: if you want the LLVM helper to be installed on your system for all
users, you need to build it before issuing 'make install' in the parent
directory.

3) Gotchas, feedback, bugs
--------------------------

This is an early-stage mechanism, so field reports are welcome. You can send
bug reports to <afl-users@googlegroups.com>.

4) Bonus feature: deferred instrumentation
------------------------------------------

AFL tries to optimize performance by executing the targeted binary just once,
stopping it just before main(), and then cloning this "master" process to get
a steady supply of targets to fuzz.

Although this approach eliminates much of the OS-, linker- and libc-level
costs of executing the program, it does not always help with binaries that
perform other time-consuming initialization steps before getting to the input
file.

In such cases, it would be beneficial to initialize the forkserver a bit later,
once most of the initialization work is already done, but before the binary
attempts to read the fuzzed input and parse it. You can do this in LLVM mode in
a fairly simple way:

1) First, locate a suitable location in the code for the deferred initialization
   to take place. This needs to be done with *extreme* care to avoid breaking
   the binary. In particular, the program will probably malfunction if the
   initialization happens after:

   - The creation of any vital threads or child processes - since the forkserver
     can't clone them easily.

   - The creation of temporary files, network sockets, offset-sensitive file
     descriptors, and similar shared-state resources - but only provided that
     their state meaningfully influences the behavior of the program later on.

   - Any access to the fuzzed input, including the metadata about its size.

   Of course, things will also not work if the forkserver is never initialized
   at all; in this case, afl-fuzz will complain about failed handshake and bail
   out.

2) Next, insert the following global function declaration somewhere in the
   source file:

   void __afl_manual_init(void);

   ...and add a call to this function in the desired location before recompiling
   the project with afl-clang-fast (afl-gcc and afl-clang will *not* work).

3) Finally, be sure to set AFL_DEFER_FORKSRV=1 before invoking afl-fuzz.

Again, this feature is easy to misuse; be careful and double-test that the
coverage and the number of discovered paths is comparable between normal and
deferred runs. That said, when you do it well, you can see gains up to 10x or
so:

  https://groups.google.com/forum/#!topic/afl-users/fNMJHl7Fhzs
bnagy/osx-afl-llvm