Seed Selection for Successful Fuzzing

The artifact associated with our ISSTA 2021 paper "Seed Selection for Successful Fuzzing". While our primary artifact is the OptiMin corpus minimizer, we also provide the necessary infrastructure to reproduce our fuzzing experiments.

Getting Started

Setup your environment

Set up your environment (assumes a modern Ubuntu OS, >= 18.04 && <= 20.04, and Python, >= 3.6 && <= 3.8):

# Install prerequisites
sudo apt update
sudo apt install -y git docker.io python3-venv 

# Add yourself to the docker group (don't forget to log out and log back in so
# that the group changes take effect)
sudo usermod -aG docker $USER

# Setup virtualenv
python3 -m venv seed_selection
source seed_selection/bin/activate
pip3 install wheel

# Get this repo
git clone https://github.com/HexHive/fuzzing-seed-selection
pip3 install fuzzing-seed-selection/scripts

Build OptiMin

OptiMin is our SAT-based corpus minimization tool. It supports coverage generated by both AFL and llvm-cov (only AFL is used in the paper). Similarly, OptiMin can back out to both Z3 or EvalMaxSAT (only EvalMaxSAT is used in the paper). To build:

docker build -t seed-selection/optimin fuzzing-seed-selection/optimin

Run OptiMin

OptiMin takes a large "collection corpus" and selects a subset of seeds that are used for fuzzing. This is based on the code coverage for each seed in the collection corpus.

While we provide tools to generate code coverage information for a given corpus (based on afl-showmap), this can be time consuming (depending on the size of the corpus). Thus, we provide seed traces in HDF5 archives.

For example, to perform a corpus minimization base on Google FTS FreeType2 coverage:

Download the coverage HDF5 from here.

wget https://datacommons.anu.edu.au/DataCommons/rest/records/anudc:6106/data/afl-showmap-coverage/fts/freetype2.hdf5

Expand the HDF5 using the expand_hdf5_coverage.py script

expand_hdf5_coverage.py -i freetype2.hdf5 -o /tmp/freetype2

# Expected output:
#
# 466 seeds to extract
# Expanding freetype2.hdf5: 100%

Perform an unweighted minimization based on edges only (not hit counts)

docker run -v /tmp/freetype2:/tmp/freetype2   \
  seed-selection/optimin -e /tmp/freetype2

# Expected output:
#
# afl-showmap corpus minimization
#
# [############################################################] 100% Reading seed coverage
# [############################################################] 100% Generating clauses
# [*] Running Optimin on /tmp/freetype2
# [*] Running EvalMaxSAT on WCNF
# [+] EvalMaxSAT completed
# [*] Parsing EvalMaxSAT output
# [+] Solution found for /tmp/freetype2
# 
# [+] Total time: 0.01 sec
# [+] Num. seeds: 37
#
# ...

Perform an unweighted minimization including edge hit counts

docker run -v /tmp/freetype2:/tmp/freetype2  \
  seed-selection/optimin /tmp/freetype2

# Expected output:
#
# afl-showmap corpus minimization
#
# [############################################################] 100% Reading seed coverage
# [############################################################] 100% Generating clauses
# [*] Running Optimin on /tmp/freetype2
# [*] Running EvalMaxSAT on WCNF
# [+] EvalMaxSAT completed
# [*] Parsing EvalMaxSAT output
# [+] Solution found for /tmp/freetype2
#
# [+] Total time: 0.01 sec
# [+] Num. seeds: 53
#
# ...

Download the file weights (i.e., sizes) from here.

wget https://datacommons.anu.edu.au/DataCommons/rest/records/anudc:6106/data/weights/ttf.csv

Perform a weighted minimization based on file size and edges only

docker run -v /tmp/freetype2:/tmp/freetype2 -v $(pwd):/tmp   \
  seed-selection/optimin -e -w /tmp/ttf.csv /tmp/freetype2

# Expected output:
#
# afl-showmap corpus minimization
#
# [*] Reading weights from `/tmp/ttf.csv`... 0s
# [############################################################] 100% Calculating top
# [############################################################] 100% Reading seed coverage
# [############################################################] 100% Generating clauses
# [*] Running Optimin on /tmp/freetype2
# [*] Running EvalMaxSAT on WCNF
# [+] EvalMaxSAT completed
# [*] Parsing EvalMaxSAT output
# [+] Solution found for /tmp/freetype2
#
# [+] Total time: 0.01 sec
# [+] Num. seeds: 37
#
# ...

Detailed Description

Additional Files

The sizes of our collection corpora mean that we cannot store them in a Git repo. Instead, we store ancillary data at ANU's DataCommons repository, available here.

Tracing Code Coverage

Corpus minimization is typically based on some notion of "code coverage". To ensure a fair and uniform comparison across the three corpus minimization tools (afl-cmin, MinSet, and OptiMin), we use AFL's notion of edge coverage. This coverage information can be generated as follows

Compile your target with AFL instrumentation. See the AFL documentation for instructions on how to do this.
Run replay_seeds.py with your target program and your collection corpus. This will generate an HDF5 archive containing coverage information that can then be minimized.

Corpus Minimization

Our paper surveys a number of corpus minimization tools: OptiMin, afl-cmin, and MinSet. A more detailed explanation on how to use these tools and reproduce our results is given below.

OptiMin

Instructions for running OptiMin are given above. As described previously, a weighted minimization can be performed by supplying a weights CSV file to OptiMin's -w option. This weights file has the following format:

FILE_1,WEIGHT
FILE_2,WEIGHT
FILE_3,WEIGHT
FILE_4,WEIGHT
FILE_5,WEIGHT

Where FILE_1, FILE_2, ... corresponds to the name of a file within the corpus directory (only the filename needs to be provided: the corpus directory path should not be provided), and WEIGHT is an unsigned integer >= 1. We provide weights for our collection corpora here.

`afl-cmin`

afl-cmin is AFL's inbuilt corpus minimization tool. afl_cmin.py wraps afl-cmin so that it outputs the names of the seeds in the minimized corpus (rather than copying the seeds and wasting storage).

MinSet

MinSet is the tool developed by Rebert et al. in their paper Optimizing Seed Selection for Fuzzing. While we were able to obtain the tool from the authors, it is not open source and thus we are unable to provide it here. Please contact the authors if you would like to obtain the source code.

If you have access to the source code, you can perform a MinSet minimization by:

Generate code coverage as described here
Expand the generated HDF5 archive using expand_hdf5_coverage.py
Convert the expanded coverage to a set of bitvector traces using MoonBeam
Run the qminset.py wrapper on the bitvector traces

Fuzzing Experiments

In addition to the OptiMin tool, we also provide the necessary infrastructure to reproduce our fuzzing experiments. Detailed instructions are provided here.

diewufeihong/fuzzing-seed-selection