SMT-JIT

A toy Just-In-Time Compiler for evaluating SMT formulas (QF_AUFBV)

This project implements a simple JIT compiler for the QF_AUFBV subset of SMT emitted by the KLEE symbolic execution engine, including unbounded arrays of bitvectors and fixed-width bitvectors. SMT-JIT uses the LLVM's ORCv2 API for jitting.

Given an SMT formula and a bunch of full assignments in the SMT-LIB2 format, SMT-JIT will compile the SMT formula to LLVM IR, emit x86-64 machine code, and evalute the formula with the assignments. The main motivation is to be able to evaluate or fuzz SMT formulas before querying the main SMT solver for satisfiability, e.g., in the KLEE's CexCachingSolver or Z3's local search phase.

Example:

(set-logic QF_AUFBV )
(declare-fun arg00 () (Array (_ BitVec 32) (_ BitVec 8) ) )
(assert (let ( (?B1 (select  arg00 (_ bv5 32) ) ) ) (and  (and  (=  false (=  (_ bv0 8) ?B1 ) ) (=  false (=  (_ bv61 8) ?B1 ) ) ) (=  false (=  (_ bv112 8) ?B1 ) ) ) ) )
(check-sat)
(exit)

The above formula will be compiled into the following IR:

define i32 @smt_0(%struct.bv_array_t** nocapture readonly %arrays) #5 {
entry:
  %arg00 = load %struct.bv_array_t*, %struct.bv_array_t** %arrays, align 8
  %0 = getelementptr inbounds %struct.bv_array_t, %struct.bv_array_t* %arg00, i64 0, i32 0
  %1 = load i64, i64* %0, align 8, !tbaa !2, !alias.scope !6
  %2 = trunc i64 %1 to i32
  %3 = icmp ugt i32 %2, 5
  %.op = and i64 %1, 4294967295
  %4 = select i1 %3, i64 5, i64 %.op
  %5 = getelementptr inbounds %struct.bv_array_t, %struct.bv_array_t* %arg00, i64 0, i32 1, i64 %4, i32 2, i32 0
  %6 = load i64, i64* %5, align 8, !alias.scope !6
  switch i64 %6, label %cont [
    i64 112, label %fail_1
    i64 61, label %fail_1
    i64 0, label %fail_1
  ]

fail_1:                                           ; preds = %entry, %entry, %entry, %cont
  %merge = phi i32 [ 1, %entry ], [ 0, %cont ], [ 1, %entry ], [ 1, %entry ]
  ret i32 %merge

cont:                                             ; preds = %entry
  br label %fail_1
}

Compilation of the above formula and evaluation over 1 milion full assignments takes about 30 milliseconds.

High-level Overview of the SMT-JIT Architecture
Benchmark Collection
bvlib: A custom Mixed-Precision BitVector library
Handling of BitVector Arrays
SMT-JIT Optimization Pipeline
Experimental Evaluation
Related Work

1. SMT-JIT Architecture

Non-exhaustive list of dependencies required to build the project:

cmake >= 3.7
LLVM 9.0 and Clang 9.0
Customized sexpresso S-expression library
(Optional) KLEE Symbolic Virtual Machine and all its dependencies (llvm-6.0, Z3, etc.)

The project has 2 main subdirectories:

jit -- the SMT JIT evaluator. jit/bvlib contains the custom bitvector library implementation.
klee -- contains a patch to KLEE that allows for dumping SMT queries and assignment inside the CexCachingSolver. In addition, the directory also has scripts used to generate the benchmarks. klee/queries contains bunch of queries dumped from KLEE after running on a few coreutils programs.

SMT-JIT can be built by following the jit/config*.sh scripts. Make sure get the sexpresso submodule by cloning with --recursive or initializing it manually.

To run SMT-JIT on some sample inputs: ./smt-jit smt-jit/jit/inputs cat.q1.smt2 cat.q2.smt2

The benchmarking mode can be entered by adding --benchmark --iterations=K, where K is a constant. To see the generated IR files you can add --save-temps --temp-dir=DIR, where DIR is a valid directory path.

2. Benchmark Collection

The KLEE benchmarks were collected by instrumenting the CexCachingSolver and dumping the queries in the SMT-LIB2 format, together with all attempted assignments. The benchmarks were collected by running KLEE on cat and echo, as specified in the klee/runs.txt file. The coreutils bitcode was collected by following the official KLEE tutorial on testing coreutils.

The experiments showed that most of the queries attempted by KLEE are fairly simple and involve reasoning about wide bitvectors (up to 512 bits), and short bitvector arrays. SMT-JIT is design to only deal with the subset of the STM-LIB2 that is part of this benchmark suite.

3. Mixed-Precision BitVector Library

After analyzing the collected SMT queries it became apparent that to efficiently evaluate the SMT generated by KLEE, an efficient mixed-precision bitvector support is needed.

Even though there are mature mixed-precision arithmetic libraries implemented (e.g., GMP, Boost::MPA, llvm::APInt), the authors decided to implement their custom bitvector library. The key insight is that although KLEE generates operations over bitvectors wider than machine arithmetic (typically 64-bit), the actual runtime values are fairly small and fit the machine integers. With that observation, bvlib support switching from the native to BigInt representations, but it only does it if it actually needs more than 64-bits to perform the requested arithmetic operations. Bvlib tracks approximate number of bits needed, and tries very hard not to fall back to the BigInt representation. As shown in the evaluation section, this never happened during the experimentation, and thus the BigInt operations are not implemented and simply trap.

One of the main design goals was to make all the basic bitvector arithmetic operations branch-free, and thus cheap to reason about symbolically. Another important goal was to expose the bitvector functions to the JIT optimizer, by maintaining its LLVM bitcode representation. IE, the bitcode library implementation is available to both the host and JIT compilers. The version that is loaded at runtime by the jit is first heavily optimized offline by the Clang's -O3 optimization pipeline.

Bvlib has a small header-only interface with no transitive includes needed, while the implementation uses C++14 and the a small subset of the C++ standard library.

4. BitVector Array Handling

In the First-Order Theory of Arrays, arrays are functions taking indices and returning elements. Arrays are unbounded and can have default values at unspecified array indices. Sample SMT array declaration: (declare-fun arg00 () (Array (_ BitVec 32) (_ BitVec 8) ) ) In this case arg00 is an 32-bit addressable array of 8-bit values of unbounded size.

Bvlib doesn't restrict the width of individual bitvector array elements -- array elements can have different width. This is because the 'static' bitvector width is set during construction and does not change, so carrying it around does not affect performance substantially and allows for retrieve 'full' bitvector elements with simple loads.

Unlike SMT arrays, bvlib arrays have fixed and immutable length. In order to support default array values, all array accesses past their initialized sized are loading the one-past-last array elements. This is handled by over-allocating arrays by 1 extra element. Bitvector arrays are dynamically allocated with a custom bump-pointer allocator. The host and JIT can use separate memory pools.

5. SMT-JIT Optimization Pipeline

SMT-JIT uses the new ORCv2 LLVM JIT library. While ORC makes it easy to introduce custom optimization pipelines and link different modules together, it is not easy to perform function recompilation. Because of this limitation, SMT-JIT does not attempt any profiling or recompilation, and relies on heavily optimizing the SMT formulas upon the first compilation.

At the startup, SMT-JIT loads the bvlib bitvector library bitcode emitted by Clang. The bitcode is already heavily optimized by Clang for the native host, thus no other optimization is performed. Then, the module is cloned to serve as a template for the Modules for all the future-generated SMT formulas. All small bitvector arithmetic functions are marked as alwaysinline, while the other functions get externalized.

Each SMT formula is initially generates as n + 1 functions, where n is the total number of assertions. Each function takes as an input all the declared bitvector arrays. There are n function that each correspond to a single assertion, and an additional function that checks which assertion, if any, failed. The assertion function are also marked as alwaysinline. All the generated function are given appropriate attributes and linkage types; the only function with external linkage is the main function that calls the assertion functions.

Before emitting machine code, SMT-JIT runs a series of LLVM optimizations passes:

Always Inliner Pass
Instruction Combining
Global Value Numbering
Control Flow Graph Simplification

In authors' experience, inlining of the bitvector arithmetic functions is and indispensable optimization in this architecture. Together with various peephole optimizations, constant propagation, redundancy elimination, it allows for most of the original instructions to be completely removed.

In extreme cases, this optimization pipeline can even prove some of the input SMT formulas to be unsatisfiable, e.g:

(set-logic QF_AUFBV )
(declare-fun arg01 () (Array (_ BitVec 32) (_ BitVec 8) ) )
(assert (let ( (?B1 (select arg01 (_ bv1 32) ) ) ) (let ( (?B2 ((_ extract 0 7) ((_ sign_extend 24) ?B1 ) ) ) ) (and (and (and (= (_ bv101 8) ?B2 ) (= false (= (_ bv0 8) ?B1 ) ) ) (= false (= (_ bv45 8) ?B1 ) ) ) (= false (= (_ bv98 8) ?B2 ) ) ) ) ) )
(check-sat)
(exit)

The generated LLVM IR:

define i32 @smt_3(%struct.bv_array_t** nocapture readonly %arrays) #5 {
entry:
  ret i32 1
}

6. Experimental Evaluation

The experiments were performed on the Solver queries dumped from the KLEE's CexCachingSolver, using the klee/klee.diff patch. Two coreutils programs, cat and echo, were run under the KLEE interpreter mode using the commands in klee/runs.txt..

To check if an assignment is a model, KLEE performs a walk over the expression DAG and recursively calculates the final evaluation result. The expression are already in-memory, and do not need to be converted to any intermediate form before evaluation. The results for SMT-JIT do not include the parsing time, as it would not be necessary if SMT-JIT was invoked from inside KLEE. SMT-JIT took less than a millisecond to compile each SMT formula.

For every query, every assignment was evaluated exactly 100,000 times. To keep implementation simple, in the case of KLEE, each single assignment was attempted 100,000 times at once, while the SMT-JIT benchmarks performed 100,000 loops over all assignment. The authors suspect that this may have helped the KLEE evaluator, e.g., because of easier branch prediction for consecutive evaluation.

The evaluation was performed on dual-socket workstation with the Intel Xeon E5-2680 CPU with 96 GiB of RAM, running on Ubuntu 18.04. The system was using the performance CPU governor mode. All programs ware compiled in the Release cmake mode with clang-6.0 as the C and C++ compiler and lld-6.0 as the linker. Note that every experiment was run only once, thus the reported times should be treated with necessary caution. All the benchmark files are available under jit/inputs.

Benchmark	SMT-JIT [ms]	KLEE [ms]
echo.q0	384	13748
echo.q10	258	9039
echo.q11	244	3504
echo.q12	244	2945
echo.q13	257	3476
echo.q14	236	1029
echo.q15	216	956
echo.q16	153	780
echo.q17	287	1166
echo.q18	301	2099
echo.q1	94	6049
echo.q20	280	1490
echo.q21	222	3315
echo.q22	266	896
echo.q23	317	1231
echo.q24	202	949
echo.q25	365	1392
echo.q26	272	1113
echo.q27	168	822
echo.q28	247	3682
echo.q29	178	831
echo.q2	83	5862
echo.q30	193	2539
echo.q3	60	5548
echo.q4	229	3444
echo.q5	71	5793
echo.q6	349	1348
echo.q7	335	1317
echo.q8	330	3604
echo.q9	262	1108

7. Related Work

To author's best knowledge, there are no previous 'true' JIT compilers for SMT; there are a few tools that attempt to translate SMT formulas to C or C++:

goSAT [1] focuses on Floating Point formulas and translates them C code, trying to use native floating point arithmetic.
JIT Fuzzing Solver [2] translates SMT formulas into C++ code, and fuzzes them with the AFL fuzzer in hope to find satisfying assignments. In contrast to SMT-JIT, it does not support bitvectors wider than machine integers (> 64 bits).
SMTLib2C [3] translates SMT formulas to C or Lustre code.

References

[1]: M. Ammar Ben Khadra, Dominik Stoffel, Wolfgang Kunz: goSAT: Floating-point satisfiability as global optimization. FMCAD 2017

[2]: D. Liew: JIT Fuzzing Solver, https://github.com/delcypher/jfs

[3]: A. Katis: SMTLib2C, https://github.com/andrewkatis/SMTLib2C

kuhar/smt-jit