/detecting-adversarial-samples

Code for "Detecting Adversarial Samples from Artifacts" (Feinman et al., 2017)

Primary LanguagePython

Detecting Adversarial Samples from Artifacts

This repository contains the code for the paper Detecting Adversarial Samples from Artifacts (Feinman et al., 2017).

Requirements & Setup

This code repository requires Keras > 2.0 and TensorFlow. Keras must be configured to use TensorFlow backend. A full list of requirements can be found in requirements.txt. To install, run the following command to clone the repository into a folder of your choice:

git clone https://github.com/rfeinman/detecting-adversarial-samples.git

On UNIX machines, after cloning this repository, it is recommended that you add the path to the repository to your PYTHONPATH environment variable to enable imports from any folder:

export PYTHONPATH="/path/to/detecting-adversarial-samples:$PYTHONPATH"

Code Structure

The source code is located in the detect/ subfolder, and scripts that users will run to perform various steps are located in the scripts/ subfolder. An empty subfolder, data/, is included for storing trained models and adversarial sample arrays. Instructions for running the code are below.

Running the Code

All of the scripts for running the various parts of the code are located in the scripts/ subfolder.

1. Train a new model

To train a new model for a particular data set, simply run

python train_model.py -d=<dataset> -e=<nb_epochs>

where <dataset> is one of either 'mnist,' 'cifar' or 'svhn,' and <nb_epochs> is an integer indicating the number of epochs to train for. We recommend using 10 epochs for MNIST, and 60 for each of CIFAR and SVHN. For example, to train the MNIST model for 10 epochs, we would run

python train_model.py -d=mnist -e=10

The model will be trained and saved into the data/ subfolder and named model_<dataset>.h5. An optional batch size parameter is also available, specified with -b=<batch_size>. The default training batch size is 128.

2. Craft adversarial samples

To craft adversarial samples for a particular data set, you must first train the model for that data set (details above). Then, simply run

python craft_adv_samples.py -d=<dataset> -a=<attack>

where <dataset> is the same as above and <attack> is one of either 'fgsm,' 'jsma,' 'bim-a,' 'bim-b' or 'all,' indicating which method to use to craft adversarial samples. For example, to craft adversarial samples for the MNIST model using FGSM, we would run

python craft_adv_samples.py -d=mnist -a=fgsm

If 'all' is chosen (the default), all types of adversarial samples will be generated. Arrays holding the adversarial samples are stored in the data/ subfolder and named Adv_<dataset>_<attack>.npy. An optional batch size parameter for evaluating adversarial samples is again provided (-b=<batch_size>). The default is 256.

3. Detect adversarial samples

To run the detection script, you must first train the model and craft adversarial samples for each data set you would like to use (details above). Then, simply run

python detect_adv_samples.py -d=<dataset> -a=<attack>

where <dataset> and <attack> are the same as described above. An optional batch size parameter is again provided (-b=<batch_size>). For all of the adversarial samples provided, an equal number of noisy samples will be generated and included alongside the original samples as part of the 'negative' class for the detector. The perturbation size of these noisy samples is determined based on the average L2 perturbation size of the adversarial samples. Then, the Bayesian uncertainty and kernel density features will be computed for each of the normal, noisy and adversarial samples. A logistic regression model is trained on these features and the detector is built.

MNIST Demonstration

Here, a simple demonstration is provided of the commands issued to run the full experiment with MNIST, using the FGSM attack. The following commands are used to run all 3 steps:

1. python train_model.py -d=mnist -e=10
2. python craft_adv_samples.py -d=mnist -a=fgsm
3. python detect_adv_samples.py -d=mnist -a=fgsm