The artifact associated with our ISSTA 2021 paper "Seed Selection for Successful Fuzzing". While our primary artifact is the OptiMin corpus minimizer, we also provide the necessary infrastructure to reproduce our fuzzing experiments.
Set up your environment (assumes a modern Ubuntu OS, >= 18.04 && <= 20.04
,
and Python, >= 3.6 && <= 3.8
):
# Install prerequisites
sudo apt update
sudo apt install -y git docker.io python3-venv
# Add yourself to the docker group (don't forget to log out and log back in so
# that the group changes take effect)
sudo usermod -aG docker $USER
# Setup virtualenv
python3 -m venv seed_selection
source seed_selection/bin/activate
pip3 install wheel
# Get this repo
git clone https://github.com/HexHive/fuzzing-seed-selection
pip3 install fuzzing-seed-selection/scripts
OptiMin is our SAT-based corpus minimization tool. It supports coverage generated by both AFL and llvm-cov (only AFL is used in the paper). Similarly, OptiMin can back out to both Z3 or EvalMaxSAT (only EvalMaxSAT is used in the paper). To build:
docker build -t seed-selection/optimin fuzzing-seed-selection/optimin
OptiMin takes a large "collection corpus" and selects a subset of seeds that are used for fuzzing. This is based on the code coverage for each seed in the collection corpus.
While we provide tools to generate code coverage information for a given corpus
(based on afl-showmap
),
this can be time consuming (depending on the size of the corpus). Thus, we
provide seed traces in HDF5
archives.
For example, to perform a corpus minimization base on Google FTS FreeType2 coverage:
-
Download the coverage HDF5 from here.
wget https://datacommons.anu.edu.au/DataCommons/rest/records/anudc:6106/data/afl-showmap-coverage/fts/freetype2.hdf5
-
Expand the HDF5 using the
expand_hdf5_coverage.py
scriptexpand_hdf5_coverage.py -i freetype2.hdf5 -o /tmp/freetype2 # Expected output: # # 466 seeds to extract # Expanding freetype2.hdf5: 100%
-
Perform an unweighted minimization based on edges only (not hit counts)
docker run -v /tmp/freetype2:/tmp/freetype2 \ seed-selection/optimin -e /tmp/freetype2 # Expected output: # # afl-showmap corpus minimization # # [############################################################] 100% Reading seed coverage # [############################################################] 100% Generating clauses # [*] Running Optimin on /tmp/freetype2 # [*] Running EvalMaxSAT on WCNF # [+] EvalMaxSAT completed # [*] Parsing EvalMaxSAT output # [+] Solution found for /tmp/freetype2 # # [+] Total time: 0.01 sec # [+] Num. seeds: 37 # # ...
-
Perform an unweighted minimization including edge hit counts
docker run -v /tmp/freetype2:/tmp/freetype2 \ seed-selection/optimin /tmp/freetype2 # Expected output: # # afl-showmap corpus minimization # # [############################################################] 100% Reading seed coverage # [############################################################] 100% Generating clauses # [*] Running Optimin on /tmp/freetype2 # [*] Running EvalMaxSAT on WCNF # [+] EvalMaxSAT completed # [*] Parsing EvalMaxSAT output # [+] Solution found for /tmp/freetype2 # # [+] Total time: 0.01 sec # [+] Num. seeds: 53 # # ...
-
Download the file weights (i.e., sizes) from here.
wget https://datacommons.anu.edu.au/DataCommons/rest/records/anudc:6106/data/weights/ttf.csv
-
Perform a weighted minimization based on file size and edges only
docker run -v /tmp/freetype2:/tmp/freetype2 -v $(pwd):/tmp \ seed-selection/optimin -e -w /tmp/ttf.csv /tmp/freetype2 # Expected output: # # afl-showmap corpus minimization # # [*] Reading weights from `/tmp/ttf.csv`... 0s # [############################################################] 100% Calculating top # [############################################################] 100% Reading seed coverage # [############################################################] 100% Generating clauses # [*] Running Optimin on /tmp/freetype2 # [*] Running EvalMaxSAT on WCNF # [+] EvalMaxSAT completed # [*] Parsing EvalMaxSAT output # [+] Solution found for /tmp/freetype2 # # [+] Total time: 0.01 sec # [+] Num. seeds: 37 # # ...
The sizes of our collection corpora mean that we cannot store them in a Git repo. Instead, we store ancillary data at ANU's DataCommons repository, available here.
Corpus minimization is typically based on some notion of "code coverage". To
ensure a fair and uniform comparison across the three corpus minimization tools
(afl-cmin
, MinSet, and OptiMin), we use AFL's notion of edge coverage. This
coverage information can be generated as follows
- Compile your target with AFL instrumentation. See the AFL documentation for instructions on how to do this.
- Run
replay_seeds.py
with your target program and your collection corpus. This will generate an HDF5 archive containing coverage information that can then be minimized.
Our paper surveys a number of corpus minimization tools: OptiMin, afl-cmin
,
and MinSet. A more detailed explanation on how to use these tools and reproduce
our results is given below.
Instructions for running OptiMin are given above. As described previously, a
weighted minimization can be performed by supplying a weights CSV file to
OptiMin's -w
option. This weights file has the following format:
FILE_1,WEIGHT
FILE_2,WEIGHT
FILE_3,WEIGHT
FILE_4,WEIGHT
FILE_5,WEIGHT
Where FILE_1
, FILE_2
, ... corresponds to the name of a file within the
corpus directory (only the filename needs to be provided: the corpus directory
path should not be provided), and WEIGHT
is an unsigned integer >= 1. We
provide weights for our collection corpora
here.
afl-cmin
is AFL's
inbuilt corpus minimization tool. afl_cmin.py
wraps
afl-cmin
so that it outputs the names of the seeds in the minimized corpus
(rather than copying the seeds and wasting storage).
MinSet is the tool developed by Rebert et al. in their paper Optimizing Seed Selection for Fuzzing. While we were able to obtain the tool from the authors, it is not open source and thus we are unable to provide it here. Please contact the authors if you would like to obtain the source code.
If you have access to the source code, you can perform a MinSet minimization by:
- Generate code coverage as described here
- Expand the generated HDF5 archive using
expand_hdf5_coverage.py
- Convert the expanded coverage to a set of bitvector traces using MoonBeam
- Run the
qminset.py
wrapper on the bitvector traces
In addition to the OptiMin tool, we also provide the necessary infrastructure to reproduce our fuzzing experiments. Detailed instructions are provided here.