TransLearn

ABOUT

This repository contains code implementation of the paper "With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning", at USENIX Security 2018.

DEPENDENCIES

Code is implemented using a mixure of Keras and TensorFlow. Following packages are used to perform the attack and setup the attack evaluation:

keras==2.2.0
numpy==1.14.0
tensorflow-gpu==1.8.0
h5py==2.8.0

The code is tested using Python 2.7.

HOWTO

Attack

We include a sample script that demonstrates how to perform the attack on the Face Recognition example and how to evaluate the attack performance.

python pubfig65_vggface_mimic_penalty_dssim.py

There are several parameters that need to be modified before running the code, which is included in the "PARAMETER" section of the script.

Model files of the Teacher and Student need to be downloaded using the following link, and placed at the correct path. Model files are specified by TEACHER_MODEL_FILE and STUDENT_MODEL_FILE. You can download the pre-trained models using links provided in the section below, and place them under the models folder.
We included a sample data file, which includes 1 image for each label in the Student model. Download the data file, and place it under the datasets folder.
If you are using GPU, you need to specify which GPU you want to use for the attack. This this specified by the DEVICE variable. If the specified GPU is not found, it will fall back to CPU by default.
Attack configuration is specified by this section of parameters. Most important parameters are, NB_PAIR and DSSIM_THRESHOLD.

Fingerprinting

We include two scripts showcasing how to fingerprint the Teacher model given a Student. pubfig65_fingerprint_vggface.py shows fingerprinting the VGGFace model and test on the Face Recognition model, which uses VGGFace as Teacher. This pubfig65_fingerprint_vgg16.py shows fingerprinting the VGG-16 model and test on the Face Recognition model. As described in the paper, the fingerprint image of the correct Teacher should produce an evenly-distributed prediction result, which would have a very low Gini coefficient. For example, pubfig65_fingerprint_vggface.py produces a Gini coefficient of 0.003539, and pubfig65_fingerprint_vgg16.py produces a Gini coefficient of 0.508905.

To run these examples, simply run

python pubfig65_fingerprint_vggface.py

Similar as the previous attack example, there are several parameters you need to change. And there are several special modifications comparing with the previous attack example.

You need to specify the GPU used in DEVICE.
Path to model files are specified by TEACHER_MODEL_FILE and STUDENT_MODEL_FILE. Or you can load Teacher model directly from Keras, inside the load_and_build_models() function, similar as this.
DSSIM threshold (DSSIM_THRESHOLD) is set to 1 in fingerprinting. This is because this is not intended to be an attack, therefore does not have to be stealthy.
When building the attacker, the mimic_img flag is set to be False. This is because we mimic an all-zero vector, instead of internal representation of a target image.

Patch

This script contains an example of how to patch DNN using the updated loss function. To run this script, simply run

python pubfig65_patch_neuron_distance.py

Similar as the previous example, there is some setup before running this example, as described below.

Path to model files are specified by TEACHER_MODEL_FILE and STUDENT_MODEL_FILE.
DATA_FILE specifies the patch to the training/testing dataset. We use the h5 format to store the dataset, but you can change it to any format you prefer. Dataset is loaded by the load_dataset() function. Be sure to modify the function if you change the dataset format.
Similar as before, you need to specify the GPU used for training. This is specified by DEVICE.
Parameters used by the patching is specified here. We incrementally increase the neuron distance threshold to stablize the training process. More details are included in the documentation of the script.

DATASETS

Below is the list of datasets we used in the paper.

PubFig: This dataset is used to train the Face Recognition model in the paper. The detailed information about this dataset is included in this page. We use a specific version of the dataset, where images are aligned.
CASIA Iris: This dataset is used to train the Iris Recognition task. Detailed information is included in this page.
GTSRB: This dataset is used to train the Traffic Sign Recognition model. Detailed information could be found here.
VGG Flower: This dataset is used to train the Flower Recognition model. Detailed information and download link could be found here.

PRE-TRAINED MODELS

Below is a list of links to pre-trained models we used in the paper. All models are hosted on Dropbox.

Face Recognition: link to model. This model uses imagenet mean-centering as preprocessing.
Iris Recognition: link to model. This model uses imagenet mean-centering as preprocessing.
Traffic Sign Recognition: link to model. This model uses imagenet mean-centering as preprocessing.
Flower Recognition: link to model. This model uses inception preprocessing, which rescales the input to [-1, 1].

We also converted the pre-trained VGGFace model from Caffe to Keras. The architecture is defined in utils_translearn.py, and the pre-trained model weights can be downloaded here.

uerfan/translearn