Code, Data and Results for "Limits of machine learning for automatic vulnerability detection"

Structure

Below is an annotated map of the directory structure of this repository.

.
├── scripts................................ Scripts to exactly reproduce all experiments presented in our paper.
│   └── <dataset>.......................... One directory for each dataset (CodeXGLUE, VulDeePecker).
│       └── <technique>.................... One directory for each ML technique (VulBERTa, CoTexT, PLBart)
│           ├── run.py..................... Compute experimental results for selected ML technique and dataset.
│           └── run_at.py.................. Compute adversarial training results for selected ML technique and dataset. Only for CodeXGLUE.
│
├── datasets............................... All datasets that we use in the experiments for the paper + our own created dataset VulnPatchPairs.
│   └── README.md.......................... Instructions for downloading all datasets used in this repository.
│
├── models................................. All pretrained models that are not downloaded in the training scripts.
│   └── README.md.......................... Instructions for downloading all models used in this repository.
│
├── plots.................................. Generates all plots shown in the paper.
│   ├── generate_plots.py.................. Script that generates all plots from the experimental results.
│   └── plots.............................. The produced plots and tables that can be found in the paper.
│
├── additional_experiments................. Additional experiments presented in the paper.
│   └── naturalness........................ Additional experiments on naturalness of transformations.
│       └── run.py......................... Run additional experiment on naturalness.
│
├── install_requirements.sh................ Script to install Python environment and required packages.
├── requirements.txt....................... All Python packages that you need to run the experiments.
│
├── run_experiments.sh..................... Script to reproduce all experiments presented in our paper.
│
└── README.md

Setup

Step 1: Install Anaconda

Anaconda is an open-source package and environment management tool for Python. Instructions for Installation can be found here.

Step 2: Install Requirements

We assume that you have Anaconda installed.

Running the following script from the root directory of this repository creates a virtual environment in Anaconda, and installs the required Python packages.

bash install_requirements.sh

Activate the environment with the following command.

conda activate LimitsOfMl4Vuln

Step 3: Download the required datasets

Go to datasets/README.md and follow the instructions to download all datasets needed to run our experiments.

Step 4: Download the required models

Go to models/README.md and follow the instructions to download all models needed to run our experiments.

Step 5: Ready to go

Run

bash run_experiments.sh

to reproduce all experimental results presented in our paper. The script also serves as an entry point into the scripts for the different experiments.

niklasrisse/LimitsOfML4Vuln