/symsan

A LLVM Sanitizer for Symbolic Tracing

Primary LanguageC++Apache License 2.0Apache-2.0

License

SymSan: Time and Space Efficient Concolic Execution via Dynamic Data-Flow Analysis

SymSan (Symbolic Sanitizer) is an efficient concolic execution engine based on the Data-Floow Sanitizer (DFSan) framework. By modeling forward symbolic execution as a dynamic data-flow analysis and leveraging the time and space efficient data-flow tracking infrastructure from DFSan, SymSan imposes much lower runtime overhead than previous symbolic execution engines.

Similar to other compilation-based symbolic executor like SymCC, SymSan uses compile-time instrumentation to insert symbolic execution logic into the target program, and a runtime supporting library to maintain symbolic states during execution.

To learn more, checkout our paper at USENIX Security 2022.

Building

Because SymSan leverages the shadow memory implementation from LLVM's sanitizers, it has more strict dependency on the LLVM version. Right now only LLVM 12 is tested.

Build Requirements

  • Linux-amd64 (Tested on Ubuntu 20.04)
  • LLVM 12.0.1: clang, libc++, libc++abi

Compilation

Create a build directory and execute the following commands in it:

$ CC=clang-12 CXX=clang-12 cmake -DCMAKE_INSTALL_PREFIX=/path/to/install -DCMAKE_BUILD_TYPE=Release /path/to/symsan/source
$ make
$ make install

Build in Docker

docker build -t symsan .
docker run  --rm --ulimit core=0 symsan bash -c 'cd /workdir/symsan && ./check.sh 2>/dev/null'

LIBCXX

The repo contains instrumented libc++ and libc++abi to support C++ programs. To rebuild these libraries from source, execute the rebuild.sh script in the libcxx directory.

NOTE: because the in-process solving module (solver/z3.cpp) uses Z3's C++ API and STL containers, so itself depends on the C++ libs. Due to such dependencies, you'll see linking errors when building C++ targets when using this module. Though it's possible to resolve these errors by not instrumenting the dependencies (adding them to the ABIList, then rebuild the C++ libs), we don't recommend using it for C++ targets. Instead, it's much cleaner to use ann out-of-process solving module like Fastgen.

Test

To verify the code works, try some simple tests (forked from Angora, adapted by @insuyun to lit):

$ pip install lit
$ cd your_build_dir
$ lit tests

Environment Options

  • KO_CC specifies the clang to invoke, if the default version isn't clang-12, set this variable to allow the compiler wrapper to find the correct clang.

  • KO_CXX specifies the clang++ to invoke, if the default version isn't clang++-12, set this variable to allow the compiler wrapper to find the correct clang++.

  • KO_USE_Z3 enables the in-process Z3-based solver. By default, it is disabled, so SymSan will only perform symbolic constraint collection without solving. SymSan also supports out-of-process solving, which provides better compatiblility. Check FastGen.

  • KO_USE_NATIVE_LIBCXX enables using the native uninstrumented libc++ and libc++abi.

  • KO_DONT_OPTIMIZE don't override the optimization level to O3.

Hybrid Fuzzing

SymSan needs a driver to perform hybrid fuzzing, like FastGen. It could also be used as a custom mutator for AFL++.

Documentation

Still under construction, unfortunately.

Reference

To cite SymSan in scientific work, please use the following BibTeX:

@inproceedings {chen2022symsan,
  author =       {Ju Chen and Wookhyun Han and Mingjun Yin and Haochen Zeng and
                  Chengyu Song and Byoungyong Lee and Heng Yin and Insik Shin},
  title =        {SymSan: Time and Space Efficient Concolic Execution via Dynamic Data-Flow Analysis},
  booktitle =    {{USENIX} Security Symposium (Security)},
  year =         2022,
  url =          {https://www.usenix.org/conference/usenixsecurity22/presentation/chen-ju},
  publisher =    {{USENIX} Association},
  month =        aug,
}