/SeeWasm

A native symbolic execution engine for WebAssembly

Primary LanguagePython

SeeWasm

SeeWasm-logo

WebAssembly (Wasm), as a low-level language, has several advantages. Moreover, Wasm can be translated from high-level mainstream programming languages, e.g., C, C++, Go, Rust and so on.

In this project, we have implemented a symbolic execution engine for Wasm binaries, SeeWasm. Our goal is to build a set of toolchain, which can take source code file (written in other programming languages) as input, symbolically execute it, and output feasible paths with their solutions for further analysis (e.g., vulnerability detection).

 Prerequisites 

First, you have to make sure Python3.7+ is installed.

Then, some libraries should be installed as follows:

python -m pip install -r requirements.txt

You can test if all of them are installed successfully by:

python test.py

This command will traverse the ./test folder and extract all Wasm binaries. If all of them can be symbolically executed without any exceptions, the success info would shown in your terminal after several seconds.

Some samples exist in the folder, including hello world written in C, Go, and Rust. These Wasm binaries can be compiled from C, Go and Rust, respectively, and the compiling processes are illustrated in here(C and Rust), and here(Go). We will not repeat how to compile programs into Wasm binaries in this readme.

However, for Rust programs, you can also use our one-shot solution: ./build-rs.sh -f|-d PATH, to compile and demangle rust files or projects. Run ./build-rs.sh -h for more information.

Analyze

In this section, we would show how to use SeeWasm to analyze the generated Wasm file.

Options

All valid options are shown in below:

SeeWasm, a symbolic execution engine for Wasm module

optional arguments:
  -h, --help            show this help message and exit

Input arguments:
  -f WASMMODULE, --file WASMMODULE
                        binary file (.wasm)
  --stdin STDIN         stream of stdin
  --sym_stdin SYM_STDIN
                        stream of stdin in N bytes symbols
  --args ARGS           command line
  --sym_args SYM_ARGS [SYM_ARGS ...]
                        command line in symbols, each of them is N bytes at most
  --sym_files SYM_FILES SYM_FILES
                        Create N symbolic files, each of them has M symbolic bytes
  --source_type [{c,go,rust}]
                        type of source file

Features:
  --entry ENTRY         set entry point as the specilized function
  --visualize           visualize the ICFG on basic blocks level
  --incremental         enable incremental solving
  -v [{warning,info,debug}], --verbose [{warning,info,debug}]
                        set the logging level

Analyze:
  -s, --symbolic        perform the symbolic execution

We will detail these options according to their functionalities.

Input Arguments

According to values given to input arguments, SeeWasm can deassemble the target binary and construct valid inputs.

Specifically, -f is not an optional option, following which the path of to be analyzed Wasm binary should be given. --stdin STRING and --sym_stdin N can pass concrete or symbolic bytes through the stdin stream. The difference between them is that a concrete string has to be passed through --stdin, and a string consisting of N symbolic characters need to be passed through --sym_stdin. For example, --sym_stdin 5 will input 5 symbolic bytes if some functions need to read input from stdin.

Similarly, --args STRING1, STRING2, ... and --sym_args N1, N2, ... pass concrete and symbolic arguments to the Wasm binary. For instance, if main requires three arguments where each of them should be two bytes, --sym_args 2 2 2 is enough.

Some programs will interact with files and conduct reading and writing. SeeWasm can also simulate this by a symbolic file system. Users have to apply --sym_files N M to create N symbolic files, where each of them has (or can hold) M bytes at most.

Finally, as several high-level programming languages can be compiled to Wasm binaries. We have achieved some specific optimizations, but users have to indicate the source language by --source_types.

Features

--entry can tell SeeWasm which function is the entry, from which the symbolic execution performs. Note that the __original_main is the default entry for all Wasm binaries following WASI standard. The toolchain we mentioned in the previous section can generate Wasm binaries following WASI standard.

As we mentioned in our paper, the given Wasm will be parsed into ICFG. Sometimes visualizing the ICFG is necessary for debugging. Thus --visualize can achieve this goal.

During symbolic execution, constraints solving is a bottleneck for the performance. We have implemented a set of optimizations on the solving process. The --incremental refers to incremental solving, which may not always introduce positive optimizations during the analysis. Therefore, we set a flag to allow users to decidie if enable the incremental solving.

The -v is an optional option. Accoding to different values, different levels of logging can be generated, which may help the debugging.

Analyze

The -s is a mandatory option. It will enable the symbolic execution analysis on the given Wasm binary.

Example

If we want to execute a program which does not requrie any extra arguments and input, the command should be:

python launcher.py -f PATH_TO_WASM_BINARY -s

The corresponding logging and results of feasible paths will be generated in ./log folder.

If compilicated arguments are required. For example, a base64 program whose main is like:

// main of base64
int main(int argc, char **argv)
{
  // environment setting
  ...

  while ((opt = getopt_long(argc, argv, "diw:", long_options, NULL)) != -1)
    switch (opt) {
      // call functions according to passed arguments
      ...
    }

  // encode or decode
}

We can see that the base64 not only requires a two bytes arguments, but also needs a string of input to encode or decode. Also, the encoded or decoded results will go to a file. Thus, the command to analyze the base64 is like:

python launcher.py -f PATH_TO_BASE64 -s --sym_args 2 --sym_stdin 5 --sym_files 1 10 -v info