This repository containts code and notebooks that demonstrate inference provenance capture and characterization for deep neural networks. The capture is done for benign and adversarial settings of image classifiers (dataset: MNIST) and malware detecttion models (datasetts: CuckooTraces and EMBER)
-
Datasets: MNIST is autimatically loaded via Keras. To test ProCapture on malware data, you need to download the [CuckooTraces](link here) and [EMBER] (link here) datasets and add them in the folder
./data/
. By default ProCapture will look for them in that path -
Pre-trained Models: We offer pre-trained models: mnist_1 , mnist_2, mnist_3, cuckoo_1, and ember_1
These models are availabe to download here. Once downloaded to 'ProCapture/models/' directory, the 'model.txt' file has the model architecture details of each model. -
Attacks: As of now, ProCapture supports the following attacks :
MNIST: Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD) Auto PGD with DLR loss function (APGD-DLR) , Square
CuckooTraces: Attack progressively flips up to first n 0 bits to 1 until it evades the model (we name this attack 'CKO')
EMBER_att => This attack progressively perturbs features within valid value ranges/options until the model changes its prediction from malware to benign (we call this attack `EMBER')
$ git clone https://github.com/um-dsp/ProCapture.git
$ cd ProCapture
$ pip install -r requirements.txt
-
activations_extractor.py
: The first step for our characterization approach is to extract activations of the target modelmodel
. It takes the following parameters in the given order: -
Dataset Name: mnist | cuckoo |ember
-
Pre-Trained Model Name: cuckoo_1 | ember_1 | mnist_1 | mnist_2 | mnist_3
-
Folder: Ground_Truth | Benign | Adversarial
It is required to use these exact folder names. This parameter sets what folder will the generated activations be saved, the default file path isfolder/dataset_name/model_name/<attack>/
. (Note: Make Sure to create the folder with the above path before running the activation generation) -
Attack Name: FGSM | APGD-DLR | PGD |square|CKO |EMBER | None
(Note: This parameter is optional. if specified, ProCapture will apply the attack on the dataset. if attack is None and the folder input is set to adversarial it will throw an Error. -stop parameter is to precise the number of batchs of graph to generate (1000 graphs per batch)
-
tasks: default is to get emperical characterization and graph for the extrating graphs from the model inputs.
(Note: if attack is None and the folder input is set to adversarial it will throw an Error. -stop parameter is to precise the number of batchs of graph to generate (1000 graphs per batch)
Sample Commands :
> python activations_extractor.py -dataset mnist -model_name mnist_1 -folder Ground_Truth_pth -model_type pytorch -task default
> python activations_extractor.py -dataset mnist -model_name mnist_1 -folder Benign_pth -model_type pytorch -task default
> python activations_extractor.py -dataset mnist -model_name mnist_1 -folder Benign_pth -model_type pytorch -task default -attack FGSM
The model activations of ground truth, test benign and adversarial data are stored in each respective folder in text files named as [true label]_[predicted label]_[index] (e.g., 0_0_150.txt). Each file contains the values of every node in every layer of the model for that specific sample.
Use Empirical_Characterization.ipynb to compute the proposed graph-related metrics for empirical characterization.
train_on_graph.py
: We train another model called graph_model
that learns the GNN of the target model model
. This model should be also stored and should be initiated and stored in a .pth file
To Train and test a predefined model on the activations set use the CLI command with the following parameters: Dataset Name, Model Name and Attack
(The Above arguments will just be used to locate the needed activations in the project folder).
The command also expects the following arguments:
-
Model Path : represents the path to save the feature extraction model that is trained to seperate activations of benign and adversarial samples
(Note: We have pre-trained models on Ground_Truth benign and FGSM data, and tested on Benign and FGSM test data).Sample Commands:
To train GNN and save it using the generated graphs use the following command :
python train_on_graph.py -dataset mnist -model_name mnist_2 -folder Ground_Truth_pth -model_type pytorch -task graph -attack FGSM -epochs 5 -save True
To explain the GNN and visualize the structred attributions of the graphs use the following commands :
python train_on_graph.py -dataset mnist -model_name mnist_2 -folder Ground_Truth_pth -model_type pytorch -task GNN_explainer -model_path models/GNN_mnist_2_FGSM_pytorch -attack FGSM -expla_mode Saliency -attr_folder data/attributions_data/
>python train_on_graph.py -dataset mnist -model_name mnist_2 -folder Ground_Truth_pth -model_type pytorch -task GNN_explainer -model_path models/GNN_mnist_2_FGSM_pytorch -expla_mode Saliency -attr_folder data/attributions_data/
gen_attributions.py
: this file explains how to transform generated activations to dataset and train an torch adversarial detection model.attributionUtils.py
holds different predefined architecture that cover all the dataset we research and produce satisfactory performance.Attributions :
in the same file we showcase the steps to generate the attributions of the models on a batch of input,
Attribution.py
: The thrid step: is to perform Attributions on the trainedgraph_model
. This file should include a fucntion that takes as input agraph_model
anddata_name
, the model should be imported automaticcaly fromModels/GNN_[model_name]_[attack]_[model_type]
. Example:Models/GNN_mnist_1_FGSM_pytorch.pth
Use Structured_Characterization.ipynb to compute the proposed graph-related metrics for structural characterization.
After generating the empirical and structural characterizations, you can run the robustness enhancement step using an example of the following commands by specifying the attack, the dataset and the benign threshold for the model at hand.
> python3 main.py -dataset ember -model_name ember_1 -folder Ground_Truth_pth -attack EMBER -expla_mode Saliency -ben_thresh 90 -attr_folder data/attributions_data/