/RedBench

This repository accompanies the research paper "P. Kreutzer, T. Kunze, M. Philippsen: Test Case Reduction: A Framework, Benchmark, and Comparative Study" published at ICSME'21.

Primary LanguageCMIT LicenseMIT

RedBench: A Benchmark for Test Case Reduction

This repository contains the RedBench benchmark for the evaluation of test case reduction techniques. It accompanies the research paper "P. Kreutzer, T. Kunze, M. Philippsen: Test Case Reduction: A Framework, Benchmark, and Comparative Study" published at ICSME'21.

RedBench includes 321 fuzzer-generated C and SMT-LIB 2 test cases that trigger different bugs in real compilers (see below for some detailed statistics), as well as an automated execution environment based on Docker to evaluate each test program with the respective compiler.

In addition, this repository also contains helper scripts to reduce all benchmark programs with the reducers contained in our RedPEG framework. This allows the replication of the reduction results from our research paper (but this repository also includes the final results for convenience, see below).

Citations

The RedBench programs have been generated with the following compiler fuzzers from the scientific literature:

  • Brummayer, R., Biere, A.: Fuzzing and Delta-Debugging SMT Solvers. In: SMT’09: International Workshop on Satisfiability Modulo Theories (Montreal, Canada, Aug. 2009), 1–5.
  • Kreutzer, P., Kraus, S., Philippsen, M.: Language-Agnostic Generation of Compilable Test Programs. In: ICST’20: International Conference on Software Testing, Verification and Validation (Virtual, Oct. 2020), 39–50.
  • Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and Understanding Bugs in C Compilers. In: PLDI’11: Programming Language Design and Implementation (San Jose, CA, Jun. 2011), 283–294.

If you want to cite RedBench, please cite our ICSME'21 research paper:

  • Kreutzer, P., Kunze, T., Philippsen, M.: Test Case Reduction: A Framework, Benchmark, and Comparative Study. In: ICSME'21: International Conference on Software Maintenance and Evolution (Virtual, Luxembourg, Sep. 2021), 58–69.

Benchmark Statistics

RedBench currently contains 321 failure-inducing programs:

language fuzzer #progs min. size med. size max. size
C Csmith 122 1.0 KiB 113.7 KiB 430.6 KiB
*Smith 128 3.0 KiB 128.9 KiB 910.5 KiB
SMT-LIB 2 FuzzSMT 26 1.2 KiB 4.5 KiB 53.4 KiB
*Smith 45 0.9 KiB 12.2 KiB 99.7 KiB

When designing the benchmark, we made sure to include programs of varying size; please refer to the size distributions of the C test cases and the SMT-LIB 2 test cases for more details.

The RedBench programs trigger 110 different bugs in 19 different versions of 5 real compilers:

language compiler versions #bugs #progs
C GCC 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0 47 134
LLVM 1.9, 2.0, 2.1, 2.2 42 116
SMT-LIB 2 Yices 2.2.0, 2.3.0, 2.4.0, 2.6.0 7 22
z3 4.4.0 2 7
CVC4 1.4, 1.5, 1.6, 1.7, 1.8 12 42

To keep the benchmark diverse, we included at most 4 programs (of different size) per bug and fuzzer.

Dependencies

RedBench has been developed and tested on Debian 10. If you want to use RedBench for your own experiments, the following (Debian) packages are required:

  • docker.io
  • python3-jinja2
  • python3-yaml

The instructions below assume that these packages have been installed.

Directory Structure

This repository is structured as follows:

  • The tools/ subdirectory contains some command line tools that are either required for setting up RedBench or simplify its use, see below.
  • The docker/ subdirectory contains the Docker based execution environment that includes the different compiler versions targeted by the RedBench programs. The instructions below explain how to build and use this environment.
  • The testsuite/ subdirectory contains the benchmark programs that RedBench consists of, separated by language. Each test case is augmented with some metadata that precisely describes the bug that it triggers. The actual test functions that check whether a program (or reduction candidate) triggers the bug are automatically generated based on this metadata. The instructions below explain how to generate the test functions (as well es several helper scripts to evaluate each test program in the respective Docker container). This directory also contains the reduction results from our comparative study.
  • The reduction/ subdirectory contains helper scripts for running the reducers in the RedPEG framework on the RedBench test cases, see below.

Command Line Tools

This repository contains some command line tools in the tools/ subdirectory. These tools are either required for setting up RedBench or simplify its use:

  • check_testcases.sh: Checks for each test program in the given path whether it really triggers the specified bug in the respective compiler (the checks can be repeated multiple times to check for deviating results in case of non-deterministic bugs). This requires that the execution environment has been set up correctly (see below) and that the test functions have been generated (see below).
  • dq.py: RedBench makes have use of YAML files (e.g., to store the metadata for each test case). The dq.py command line tool queries such YAML files for specific fields; its output is then further processed by other tools.
  • j2.py: RedBench generates several files from Jinja2 templates (e.g., the Dockerfiles for the execution environment or the test functions). The j2.py command line tool expands such templates.
  • label_bugs.py: Assigns ascending bug IDs for the different bugs; only needed when new programs/bugs are added to RedBench.
  • link.sh: Creates symbolic links to further structure and categorize the test cases (see below).
  • remove_old_results.sh: Removes all reduction results in the given path that are not marked as the latest results (see below).

Setting up RedBench

In a nutshell, setting up RedBench consists of two steps: (1) building the execution environment and (2) generating the test functions (and helper scripts) of the test suite. The following instructions give some more details on the execution environment and the test suite and we highly recommend reading them, but if you really want to skip the details, simply type make in the root directory of the RedBench repository . Alternatively, run make docker (to only build the execution environment) or make testsuite (to only generate the test functions and helper scripts).

Warning: building the execution environment might take quite long (expect multiple hours).

Execution Environment

As indicated above, RedBench provides a Docker based execution environment that includes the different compiler versions that the test programs target. The docker/ subdirectory contains the necessary files. Note: At first glance, the execution environment might seem somewhat complicated, but we had extensibility and maintainability in mind when constructing it. To achieve these goals, the execution environment uses Jinja2 templates that are expanded with data from YAML files. The following instructions explain in more detail how everything works together.

Currently, there are three different groups of images: base, c, and smt2. The base images provide a basic execution environment (e.g., they include an OpenJDK installation that is required to run the RedPEG reducers); RedBench currently uses different Debian and Ubuntu versions for these base images. The c and smt2 images are built upon the base images and add the different versions of the C and SMT-LIB 2 compilers that the RedBench test programs target.

The Jinja2 templates are contained in the _templates subdirectories. There are template files for the different Dockerfiles (from which the actual Docker images are built) and for several helper scripts (which are used for building the Docker images and for running the Docker containers).

YAML files describe the different versions that should be built and include additional information that is required to build these versions. For example, the YAML files base/debian/data.yml and base/ubuntu/data.yml contain the necessary information for the Debian and Ubuntu base images, whereas the YAML file c/gcc/data.yml contains the data for the different versions of the GCC C compiler.

Building the Environment

From a technical point of view, building the Docker images for all versions of a compiler (or base image) consists of the following steps:

  • The build_all.sh and build.sh helper scripts are generated from Jinja2 templates.
  • The build_all.sh script is executed. It reads the data from the YAML file for this compiler (or base image) and runs the generated build.sh script for each specified version.
  • For each version, the build.sh script uses the additional information provided in the YAML file to generate a Dockerfile from a Jinja2 template. It then builds a Docker image from this Dockerfile. (Thus, each compiler version results in its own Docker image.)
    • Note: Each Dockerfile includes steps to download the respective compiler version from its official website; we added some checks that try to ensure the integrity of the downloads, but we are not responsible in any way for the downloaded files!

To simplify this process, we provide make targets. Run make docker in the root directory of the RedBench repository (or simply make in the docker/ subdirectory) to build all versions of all compilers. To only build all versions of a single compiler, run make <language>/<compiler>/docker in the docker/ subdirectory (e.g., run make c/gcc/docker to build all versions of the GCC C compiler). Note: Before the different versions of a compiler can be built, all base images have to be built (but the Makefile should handle these dependencies automatically).

Warning: building the execution environment might take quite long (expect multiple hours).

Running a Container

When the Docker images have been built as described above, there are several helper scripts for running each compiler version in a Docker container:

  • The docker/run_container.sh script is the most generic one and is meant for interactive use. It takes the compiler name and version as command line arguments. For example, ./run_container.sh gcc 4.0.0 starts a new Docker container for GCC 4.0.0 and spawns a new shell in it. This script also provides means for copying files to and from the container:
    • To copy files to the container, specify the source path on the host as third command line argument. Files that are copied to the container can be found in its /data directory.
    • To copy files from the container once it has finished, specify the target path on the host as fourth command line argument. Files that are contained in the container's /output or /output_tmpfs directory (the latter one uses a tmpfs) are copied to the host.
  • Each compiler directory contains a script run_compiler.sh that starts a new Docker container for the given compiler version and runs it on the given program. For example, ./c/gcc/run_compiler.sh 4.0.0 <program> runs GCC 4.0.0 on the given program (where <program> is the path to the input program on the host). All additional command line arguments that are passed to this script are passed on to the compiler running in the Docker container.
    • Note: Depending on your use case, you probably do not have to run these run_compiler.sh scripts manually. Each program of the test suite (see below) comes with a script run_docker_exec.sh that automatically runs the corresponding run_compiler.sh script with proper arguments.
  • Each compiler directory also contains a script run_test.sh that runs a test function of the test suite in a new Docker container (it returns with the test function's exit code). Like the run_compiler.sh scripts, the run_test.sh scripts first take the compiler version as command line argument. Then, they either take a path to the directory of a test case or a pair of paths for a test function and an input program.
    • Note: Depending on your use case, you probably do not have to run these run_test.sh scripts manually. Each program of the test suite (see below) comes with a script test_docker_exec.sh that automatically runs the corresponding run_test.sh script with proper arguments.

Note: There are additional scripts for running reducers in a Docker container, see below.

Test Cases

The testsuite/ subdirectory contains the benchmark programs that RedBench consists of, separated by language. Each test case is located in its own subdirectory in <language>/testcases (where <language> is either c for the C test cases or smt2 for the SMT-LIB 2 test cases). Each test case consists of the original (unreduced) program prog.<language> and a YAML file data.yml that contains the metadata.

As indicated above, the test functions that check whether a program (or reduction candidate) triggers the bug in the compiler under test are automatically generated based on the metadata for each test case. The test functions are generated from Jinja2 templates contained in the _templates subdirectories. For example, the file c/_templates/test.sh.j2 contains the template for the test functions of the C test cases.

To generate the test functions, run make testsuite in the root directory of the RedBench repository (or simply run make in the testsuite subdirectory). Note that the test suite can be built without building the execution environment (but of course you cannot run the test functions in the Docker containers without building the execution environment first).

When the test suite has been built, there are additional scripts for each test case:

  • test.sh: This is the test function, which has to be executed in the proper Docker container. It takes the path to a program as command line argument and returns with exit code 1 if this program triggers the bug in the compiler under test (otherwise, it returns with exit code 0).
  • test_docker_exec.sh: This script is meant to be run on the host. It starts a new Docker container with the proper compiler version and executes the test function in it (it uses the run_test.sh scripts explained above). It returns with the same exit code as the test function in the container. Note: You can optionally pass a path to a program on the host as command line option; in this case, the test function is applied to this program instead of the original (unreduced) one (this might be handy for testing if a reduction result really still triggers the bug).
  • run_docker_exec.sh: This script is also meant to be run on the host. It starts a new Docker container with the proper compiler version and runs it on the test program (it uses the run_compiler.sh scripts explained above). It returns with the same exit code as the compiler in the container.

In addition, the make target also generates a directory structure with symbolic links to the test case directories that sorts the test cases based on several criteria:

  • <language>/by_bug_id: Sorts the test cases by the bug that they trigger; contains a subdirectory for each different bug.
  • <language>/by_compiler: Sorts the test cases by compiler and compiler version.
  • <language>/by_generator: Sorts the test cases by the fuzzer that generated them.
  • <language>/by_kind: Sorts the test cases into crashes and wrong results.
  • <language>/by_size: Sorts the test cases by size.

Running Reductions

We provide some helper scripts for running our RedPEG framework (which includes fine-tuned implementations of state-of-the-art test case reduction techniques) on the RedBench programs. This allows the replication of the reduction results from our research paper (but this repository also includes the final results for convenience, see below).

Prerequisites

The RedBench repository contains the RedPEG framework as a submodule (located in reduction/RedPEG); this submodule has to be set up correctly before the RedPEG reducers can be run. To do so, simply run make RedPEG in the root directory of the RedBench repository (this clones the RedPEG repository and builds the RedPEG framework).

Note: Of course, running the RedPEG reducers also requires that the execution environment and test suite have been built, see above.

Running the RedPEG Reducers

To run one or more RedPEG reducers on one or more RedBench programs, run the run_RedPEG.sh script in the reduction/ subdirectory. It takes the following command line arguments:

  • Required: Path to the directory that contains the programs that should be reduced. The script automatically determines all test cases in the given directory and reduces them one after another. Thus, the path can either point to a single test case directory (e.g., testsuite/c/testcases/fold-const_c_8943_117K/) or a directory that includes multiple test cases (this also supports symbolic links; e.g., provide testsuite/c/by_generator/starsmith/ to reduce all C programs that have been generated with the *Smith compiler fuzzer).
  • Optional: The name of the reduction run, which is used to determine the output path for the reduction results. After a (successful) reduction, the reduction results can be found in testsuite/<language>/testcases/<test case>/reduction/<reduction name>. Note that a reduction is skipped if the output path already exists. If this command line argument is not provided, the name of the reduction run is automatically set based on the current date and time.
  • Optional: Names of the reducers that should be run (see the RedPEG repository for more details). If no reducers are given, the reducers from the comparative study in our research paper are executed.

Also note the following:

  • The directory that contains the reduction results for each test case (i.e., testsuite/<language>/testcases/<test case>/reduction/) also contains a symbol link latest that points to the latest reduction results and that is automatically updated after a (successful) reduction run. This symbolic link allows to access the latest reduction results, independent of their name.
  • The run_RedPEG.sh script runs the RedPEG reducers with the --cache command line option to enable test outcome caching (see our research paper for more details).
  • Under the hood, the run_RedPEG.sh script uses the generic run_reducer.sh script that can also be used for running other reducer implementations (see below).

Running Other Reducers

The script run_reducer.sh in the reduction/ subdirectory is a generic helper script that can execute (more or less) arbitrary reducer implementations in a Docker container of the execution environment. It takes the following command line arguments (also see the run_RedPEG.sh script for an example on how to use this script):

  • Required: Path to a single test case directory (in contrast to the run_RedPEG.sh script, this script only handles one test case at a time). The test case directory is copied to /data in the Docker container.
  • Required: Path to the directory that contains the reducer implementation. This directory is copied to /reducer in the Docker container.
  • Required: Command line that should be executed in the Docker container to run the reducer implementation. The reducer should write its results to /output or /output_tmpfs (the latter uses a tmpfs); only the files in these directories are copied back to the host after the reducer has terminated (and only if the reducer has terminated successfully with exit code 0).
  • Optional: The name of the reduction run, which is used to determine the output path for the reduction results. After a (successful) reduction, the reduction results can be found in testsuite/<language>/testcases/<test case>/reduction/<reduction name>. Note that a reduction is skipped if the output path already exists. If this command line argument is not provided, the name of the reduction run is automatically set based on the current date and time.

Also note that this script sets the latest symbolic link after a successful reduction, see above.

Reduction Results

As indicated above, this repository also contains the reduction results from our comparative study that we presented in our research paper:

Updating the Statistics

Whenever the benchmark is updated (i.e., when programs are added or removed) or new reduction results are added, make update should be run in the root directory of the RedBench repository. This ensures that all statistics are updated (including the ones in the README.md) and that the plots for all new reduction results are generated.

Note: running make update might take a while.

License

RedBench is licensed under the terms of the MIT license (see LICENSE.mit).

The python scripts contained in this repository make use of the following open-source projects (but they have to be installed manually, see above):

  • Jinja (licensed under the terms of the BSD License)
  • PyYAML (licensed under the terms of the MIT License)