/provninja

Evading Provenance-Based ML Detectors with Adversarial System Actions

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Evading Provenance-Based ML Detectors with Adversarial System Actions

Reproducibility artifacts for the paper Evading Provenance-Based ML Detectors with Adversarial System Actions.

Overview

Folder structure

Folder Description
gadget-finder Folder containing the code and data to execute the gadget-finder algorithms.
intrusion-detection-system Folder containing the code and data files for IDS execution.

Environment Setup

We will use conda as the python environment manager. Install the project dependencies from the provng.yml using this command:

conda env update --name provng --file provng.yml

Activate the conda environment before running the experiments by running this command

conda activate provng

Gadget Finder

  • Gadget Finder
    • Finds the possible gadget chains between two programs as identified in input.csv
    • You can check a sample output in output directory.

Running the gadget finder script:

python gadget-finder.py -i input.csv -p FrequencyDB/SAMPLE_WINDOWS_FREQUENCY_DB.csv -o output/gadgets.txt

Path-based IDS

SIGL[1]

  • sigl
    • Driver script for SIGL, which is an Autoencoder based IDS that detects anomalous paths.
    • Sample causal paragraphs and feature vectors for Enterprise APT available in sample-enterprise-data directory.

Running the SIGL script:

python sigl.py

ProvDetector[2]

  • provdetector
    • Driver script for ProvDetector, which is an LOF based IDS that detects anomalous paths.
    • Sample causal paragraphs and feature vectors for Enterprise APT available in sample-enterprise-data directory.

Running the ProvDetector script:

python provdetector.py

Graph-based IDS

S-GAT

  • S-GAT
    • Driver script for S-GAT, which is an GNN based IDS that detects anomalous graph using graph structure and attributes, e.g., node/edge types.
    • Run download_sample_supply_chain_data.sh to download and unzip the sample Supply-Chain APT data from Google Drive
    • The weighted average F1 score on the provided data with the provided model should be 0.88.

Running the S-GAT script:

python gnnDriver.py gat -if 5 -hf 10 -lr 0.001 -e 20 -n 5 -bs 128 -bi -s

Prov-GAT

  • Prov-GAT
    • Driver script for Prov-GAT, which is an GNN based IDS that detects anomalous graph using node and edge attributes on top of features used by S-GAT feature.
    • Run download_sample_supply_chain_data.sh to download and unzip the sample Supply-Chain APT data from Google Drive
    • The weighted average F1 score on the provided data with the provided model should be 0.95.

Running the Prov-GAT script:

python gnnDriver.py gat -if 768 -hf 10 -lr 0.001 -e 20 -n 5 -bs 128 -bi

ProvNinja Graph

Running the ProvNinja-Graph script:

python provninjaGraph.py

Citing us

@inproceedings{mukherjee2023sec,
	title        = {Evading Provenance-Based ML Detectors with Adversarial System Actions},
	author       = {Kunal Mukherjee and Josh Wiedemeier and Tianhao Wang and James Wei and Feng Chen and Muhyun Kim and Murat Kantarcioglu and Kangkook Jee},
	year         = 2023,
	booktitle    = {Proceedings of USENIX Security Symposium (SEC)},
	series       = {USENIX '23}
}

References

[1] X. Han, X. Yu, T. Pasquier, et al., “Sigl: Securing software installations through deep graph learning,” in USENIX Security Symposium (SEC), 2021.
[2] Q. Wang, W. U. Hassan, D. Li, et al., “You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis,” in Network and Distributed System Security Symposium (NDSS), Feb. 2020.