/badger

Primary LanguageJavaMIT LicenseMIT

Badger: Complexity Analysis with Fuzzing and Symbolic Execution

Badger means the combination of fuzzing and symbolic execution for complexity analysis. This approach uses the strength and scalability of fuzzing and the precision of symbolic execution. Please have a look at our paper for more information about our approach. This repository contains the source code for the symbolic execution part of Badger (denoted as SymExe), which could be also run in separate.

Getting Started

These instructions will show you how to setup Badger and how to run it on a simple example: Insertion Sort.

Prerequisites

In order to run Badger you need basically two things: (1) a fuzzer, and (2) a symbolic execution engine. As fuzzer we use in our apporach the KelinciWCA, which is provided in its own repository. The symbolic execution engine (denoted as SymExe) is delivered with this repository. It is built on top of Java PathFinder (JPF) and Symbolic PathFinder (SPF), which makes it necessary to have the projects jpf-core and jpf-symbc ready and built. We recommend using the latest SPF version from their GitHub repository and, for now, our own branch of the JPF repository. Please be sure to have the following ".jpf/site.properties" in your home directory:

# JPF site configuration
jpf-core = ${user.home}/.../path-to-jpf-core-folder/jpf-core
jpf-symbc = ${user.home}/.../path-to-jpf-core-folder/pf-symbc
badger = ${user.home}/.../badger-sources/badger
extensions=${jpf-core},${jpf-symbc},${badger}

Build Sources

Badger uses the ant build mechanism similar to JPF and SPF. In order to build the Badger sources, go to the folder "badger-sources/badger" and run the command

ant build

Note: be sure that the site.properties file is setup correctly.

Prepare Folder Structure

Normally we use the following folder structure to analyze an applicaton:

.
+-- config.txt
+-- kelinciwca_analysis
|   +-- src
|   +-- bin
|   +-- bin-instrumented
|   +-- in_dir
|   +-- fuzzer-out
|   |   +-- afl/queue
|   |   +-- symexe/queue
+-- spf_analysis
|   +-- src
|   +-- bin

The src folders include the actual application and the drivers (one for fuzzing and one for symbolic execution). The bin folders include the Java class files. The bin-instrumented folder includes the class files instrumented by KelinciWCA. The in_dir folder includes the initial inputs provided by the user (at least one, see AFL documentation for more information). The fuzzer-out folder will be created automatically by AFL and SymExe, and contains the generated inputs.

Running Badger

In general our approach would run KelinicWCA and SymExe in parallel, i.e. start the Kelinci server, then AFL, and then SymExe.

In order to run the SymExe part of Badger please execute the following command:

DYLD_LIBRARY_PATH=path/to/z3/build java -cp "[..]/badger-source/badger/build/*:[..]/badger-source/badger/lib/*:[..]/jpf-core/build/*:[..]/jpf-symbc/build/*:[..]/jpf-symbc/lib/*" edu.cmu.sv.badger.app.BadgerRunner path/to/config-file

Depending on your application, you might want to add the option -Xmx to allow more memory usage for the symbolic execution. We did run our experiments with "-Xmx10240m".

Configuration File

Badger needs a configuration property file as parameter. The following table shows all possible parameters:

Parameter Description Mandatory
dir.initial.input Input directory for SymExe, usually the same input directory as for KelinciWCA. yes
dir.sync.input Input directory for SymExe, usually the queue folder KelinciWCA. Default: dir.initial.input no
dir.export Export directory for SymExe, usually the queue that KelinciWCA uses for synchronization. yes
dir.tmp Temporary directory for Symexe, which stores all generated files. Default: "./tmp" no
symexe.wait.sec Time (seconds) SymExe will wait before checking KelinciWCA for new inputs, if it completely explored its trie and there was no new input by KelinciWCA. yes
symexe.iterations Number of iterations (=maximum number of generated input files) SymExe will remain in its own execution before attempting to import inputs from KelinciWCA. yes
symexe.delay.sec Time (seconds) SymExe will wait in the beginning before starting. Default: "0" no
symexe.bse.steps Additional steps by the Bounded Symbolic Execution phase in SymExe. Default: "0", i.e. just generate inputs for the children nodes. no
jpf.classpath Path to the application binaries. yes
jpf.target Qualified name of the Java class (driver) that contains the main method for the analysis. yes
jpf.argument Values of the arguments for the driver, separated by spaces. Default: "@@". no
symbolic.method Qualified name of the method(s), which should be contained in the symbolic analysis. Definition similar to SPF. no
symbolic.dp Decision Procedure for constraint solving. Default: "z3". no
symbolic.max_int Maximum value of symbolic integers. no
symbolic.min_int Minimum value of symbolic integers. no
symbolic.max_char Maximum value of symbolic chars. no
symbolic.min_char Minimum value of symbolic chars. no
symbolic.max_byte Maximum value of symbolic bytes. no
symbolic.min_byte Minimum value of symbolic bytes. no
symbolic.max_double Minimum value of symbolic doubles. no
symbolic.min_double Minimum value of symbolic doubles. no
symbolic.debug Print SPF debug information (=on to activate). no
symbolic.undefined SPF default for don't care values (e.g, =0). no
symbolic.optimizechoices SPF flag to use internal optimization. no 
listener Additional listeners to add during bounded symbolic execution phase. no
analysis.method Analysis method for trie exploration: "wca" or "cov". yes
analysis.heuristic Trie exploration heuristic. For "wca": "highest-cost-highest-node". "highest-cost-lowest-node", "lowest-cost-highest-node", "lowest-cost-highest-node". For "cov": "branch". yes
analysis.wca.metric Cost metric: "jumps", "instructions", "userdefined" (necessary for "wca"). no
io.utils Input Generator. Please check source package "edu.cmu.sv.badger.io" for current implementations or add your own. yes
io.input.sizes Abstract input size(s) for input generation, separated by spaces. yes
io.initial.id Initial id for generated input file. Default: "0" no
stat.print Boolean value whether to write files for statistics. Default: "true". no
stat.file.import Path to file for import statistics. Default: "import-statistic.txt" no
stat.file.generation Path to file for generation statistics. Default: "generation-statistic.txt" no
stat.file.export Path to file for export statistics. Default: "export-statistic.txt" no
stat.file.trie Path to file for internal trie statistics. Default: "trie-statistic.txt" no
stat.print.pc Boolean value whether to write files for path condition mapping. Default: "false" no
stat.file.pc.mapping Path to file for pc mapping to generated files. Default: "pcMap.txt" no
trie.print Boolean value whether to write dot files for trie graph representation. Default: "false" (High memory consumption for "true"!). no
trie.print.maxdepth Node depth, up to which the trie will be printed as dot file. Limiting this might help for debugging. no

Example: Insertion Sort

The directory "badger-source/example" contains a script to build the insertion sort example. Please first build Badger like described above and then run the "build-example.sh" script. If you want to run Badger in total (and not only the SymExe), then you need to setup KelinciWCA in advance. You also need to instrument the class files for Kelinci by using the commands described on the KelinciWCA page. The directory "badger-sources/badger/src/examples" includes the drivers for the insertion sort example.

 public static void sort(int[] a) {
        final int N = a.length;
        for (int i = 1; i < N; i++) {
            int j = i - 1;
            int x = a[i];
            while ((j >= 0) && (a[j] > x)) {
                a[j + 1] = a[j];
                j--;
            }
            a[j + 1] = x;
        }
    }

For the Insertion Sort example the following configuration file is sufficient (to run SymExe alone):

dir.initial.input=./kelinciwca_analysis/in_dir
dir.export=./kelinciwca_analysis/fuzzer-out/symexe/queue

jpf.classpath=./symexe_analysis/bin
jpf.target=InsertionSortSym
symbolic.method=InsertionSortSym.sort(con)
symbolic.dp=z3
symbolic.min_int=0
symbolic.max_int=255

symexe.wait.sec=60
symexe.iterations=10

analysis.method=wca
analysis.wca.metric=jumps
analysis.heuristic=highest-cost-lowest-node

io.utils=int-byte-array
io.input.sizes=64

If you want to run the SymExe part with KelinciWCA (i.e. Badger in total), then please add the sync directory property:

dir.sync.input=./kelinciwca_analysis/fuzzer-out/afl/queue

, and configure KelinciWCA to use this path as queue folder. To run SymExe only, please execute the above described command to run the BadgerRunner class. If you want to run the whole Badger: first run the Kelinci server (be sure to use the instrumented class files), start KelinciWCA, and then run the above described command to run the BadgerRunner class. Alternatively, you also can use the provided run script.

Run Script

In the folder "badger-sources/scripts" we provide the shell script "runBadger.sh" to run Badger/KelinciWCA/Symexe for insertion sort. It can be used as a template for other experiments. Please adjust the path variables in the script according to your system. Additionally note that we use the DYLD_LIBRARY_PATH variable to set the path to the z3 constraint solver. Please change this according to your setup environment. If you want to run Badger in total, i.e. KelinciWCA + SymExe, then please make sure that KelinciWCA is setup correctly and that the classes are instrumented. You might need to change some folder paths in the script.

Developers

  • Yannic Noller (yannic.noller at acm.org)

License

This project is licensed under the MIT License - see the LICENSE file for details