dataset
is the CRS module that compiles and manages the vulnerable programs which will be analyzed by the CRS.
The supported test suites are the following:
- NIST's Juliet;
- NIST's C Test Suite;
- A toy dataset.
- ELF format
- x86 architecture
The module does the following steps for each test suite that needs to be built:
- Getting the available sources into the test suite's folder
- Preprocessing the sources for including all the required sources and header
- Writing the preprocessed sources into the
sources
folder from the root of the repository - Creating a new entry into the CSV files of the dataset, namely
vulnerables.csv
- Filtering the sources based on the wanted CWEs
- Compiling the preprocessed sources with the compile and link flags from multiple sources (module's ones and user-provided)
- Writing the executables into the
executables
folder from the root of the repository.
All gcc
operations are performed inside a 32-bit Ubuntu 18.04 container.
- Download the repository in
/opencrs/dataset
. If you want to use other path, modify the corresponding configururation parameter. - Ensure that the repository's submodules (which are the test suites) are downloaded too. If you want to clone the repository, use the flag
--recurse-submodules
to download them too. - Install the required Python 3 packages via
poetry install --no-dev
. - Build the Docker image:
docker build --tag ubuntu_32bit_compilator -f docker/Dockerfile.ubuntu_32bit_compilator .
. - Ensure the Docker API is accessible by:
- Running the module as
root
; or - Changing the Docker socket permissions (unsecure approach) via
chmod 777 /var/run/docker.sock
.
- Running the module as
โ poetry run dataset build --testsuite TOY_TEST_SUITE
โ
Successfully built 5 executables.
โ poetry run dataset get
โ
The available executables are:
โโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ID โ CWEs โ Parent Database โ Full Path โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ toy_test_suite_0 โ Stack-based Buffer Overflow โ toy_test_suite โ executables/toy_test_suite_0.elf โ
โ toy_test_suite_1 โ โ toy_test_suite โ executables/toy_test_suite_1.elf โ
โ toy_test_suite_2 โ NULL Pointer Dereference โ toy_test_suite โ executables/toy_test_suite_2.elf โ
โ toy_test_suite_3 โ NULL Pointer Dereference โ toy_test_suite โ executables/toy_test_suite_3.elf โ
โ toy_test_suite_4 โ โ toy_test_suite โ executables/toy_test_suite_4.elf โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ poetry run dataset
Usage: dataset [OPTIONS] COMMAND [ARGS]...
Builds and filters datasets of vulnerable programs
Options:
--help Show this message and exit.
Commands:
build Builds a test suite.
show Gets the executables in the whole dataset.
from dataset import Dataset
available_executables = Dataset().get_available_executables()