/VulnerabilityClassifier

Severity scoring and exploit categorisation for vulnerability reports using machine-learning tools.

Primary LanguageJupyter NotebookGNU Affero General Public License v3.0AGPL-3.0

VulnerabilityClassifier



GitHub repo size GitHub last commit License

VulnerabilityClassifier

Automated Vulnerability Scoring and Categorisation Toolset for Vulnerability Reports.

Table of Contents

About the Tool

Vulnerability severity scoring and categorisation using machine-learning tools. VulnerabilityClassifier is an open-source toolkit that employs machine-learning techniques to learn vulnerability labels assigned by NVD, vendors, cvedetails, and other repositories, in order to predict the labels for new vulnerability reports. Here, "labels" refers to CVSS-metric labels, threat types provided by cvedetails, weakness types provided by CWE, and attack types provided by CAPEC. The purpose is to support a higher level of automation in vulnerability assessment.

We generate some datasets for CWE/CAPEC/CVSS/threat classification training purposes in another repo: NVD Data Feature Analysis

The recommended environment is Python 3. The tutorials need Jupyter Notebook (by Anaconda Navigator).

Severity Prediction Under CVSS V3

The purpose here is to be able to automatically assign a severity score to any vulnerability instance with a descriptive report, using the CVSS Version 3 standard. Two examples are shown below, whereby the TestingSamples have labels initially set as (CVSS score = 0) and other values as "l", and the labels of the PredictedSamples are predicted by the trained machine-learning models.

System

Tutorial

A severity computation pipeline that streamlines the process of machine-learning model training, testing, and validation is illustrated in the CVSS V3 Notebook, in a step-by-step manner.

  • Machine-learning model: Logistic Regression algorithm is utilised to show the applicability of the proposed approach. Any other machine-learning model can be applied to further improve the model performances.
  • Training/Testing dataset: NVD data feeds (2002-2020).
  • Validating dataset: NVD data feeds (2021).

Local Usage

  • Step 1: Clone the repo using the following command:
git clone https://github.com/Yuni0217/VulnerabilityClassifier.git 
  • Step 2: Create a virtual environment.

  • Step 3: Install requirements using pip:

pip install -r requirements.txt
  • Step 4: Download datasets from NVD feeds.
python ./CVSSV3prediction/updateDB.py
  • Step 5: Train machine-learning models for different CVSS V3 mechanisms and store them.
python ./CVSSV3prediction/trainScoreCVSSV3.py
  • Step 6: Using the trained machine-learning models to predict CVSS V3 scores for any vulnerability document.
python ./CVSSV3prediction/predictScoreCVSSV3.py -p './CVSSV3prediction/testData' -s -v

Severity Prediction Under CVSS V2

Similarly, vulnerability severity score under CVSS Version 2 can be predicted using trained machine-learning model.

System

Tutorial

The model training, testing, validation process is illustrated in the CVSS V2 Notebook, in a step-by-step manner.

  • Machine-learning model: Logistic Regression.
  • Training/Testing dataset: NVD data feeds (2002-2020).
  • Validating dataset: NVD data feeds (2021).

Threat Prediction Using CVEDetails

Threat categories that one vulnerability might be exposed to can be predicted using trained machine-learning model. With accuracy shown below (without any optimisation yet).

System

Tutorial

The model training, testing, validation process is illustrated in the Threat Prediction Notebook

  • Machine-learning model: LSTM Model.
  • Training/Testing dataset: NVD data feeds (2002-2021); cvedetails.

Before using the tutorial Threat Prediction Notebook, you can also update the data to be synchorinised with the latest vulnerability data feeds, and create mappings between CVEs and threat types in cvedetails with the following scripts:

python ./threatPrediction/updateDB.py
python ./threatPrediction/cveIDcrawler_in_cveDetails.py
python ./threatPrediction/generateThreatTrainingData.py

Future Works

  • More classification works related to weakness types provided by CWE, attack types provided by CAPEC would be added.
  • Wrapping up prediction models for different purposes (threat categorisation, CVSS-metric categorisation, CWE classification) into a pipeline.

Cite

If you use this tool in your academic work you can cite it using

@article{jiang2022towards,
  title={Towards automatic discovery and assessment of vulnerability severity in cyber--physical systems},
  author={Jiang, Yuning and Atif, Yacine},
  journal={Array},
  volume={15},
  pages={100209},
  year={2022},
  publisher={Elsevier}
}