Group project for the course Deep Learning taught in the autumn semester 2021 at ETH
The goal of this project is to research different adversarial attacks and defences. Moreover, we suggest a new type of defence (adv_gan_reverse) and re-evaluate MNIST Challenge. Shortly, MNIST Challenge is focused on attacking some secret, robust model. However, we apply APE-GAN, which should eliminate some of the adversarial perturbations. Here is an example:
The left image is original and is correctly classified as 5 by a robust model. In the middle, it's the same image but after a successful adversarial attack by AdvGAN (classified as 6). And the right image is the same from the middle after eliminating the adversarial perturbation by APE-GAN. And it's classified correctly again! For a more detailed explanation and more examples, please read our paper. Note that the images above might seem blurry here because of the GitHub viewer, but you can find these images in the paper.
Last but not least, we tried to use PyTorch lightning modules as much as possible, so it might be convenient to re-use them. We also added WandB logging, which makes it much easier to analyse the results and training.
- Clone this repository and also challenge repository: https://github.com/MadryLab/mnist_challenge:
git clone https://gitlab.ethz.ch/osaeedi/deep-learning-project/
cd deep-learning-project
git clone https://github.com/MadryLab/mnist_challenge
- Download checkpoints of the model given by the challenge authors on Tensorflow:
cd mnist_challenge
python fetch_model.py adv_trained
python fetch_model.py secret
- Convert each of these checkpoints to Pytorch checkpoints:
cd ..
python converter.py --tf-checkpoint-path=mnist_challenge/models/adv_trained --torch-checkpoint-path=dl_logs/target_model/converted_adv_trained
python converter.py --tf-checkpoint-path=mnist_challenge/models/secret --torch-checkpoint-path=dl_logs/target_model/converted_secret
If everything was correct, you should see the message: OK! Accuracies are the same!
when run each of the converters.
After that, you can remove mnist_challenge repository:
rm -r mnist_challenge
- [OPTIONAL] If you want to customize the paths (e.g., you want to store checkpoints and other logs in other folder),
you should change the
config.py
correspondingly:LOGS_PATH
— path to the root folder which contains all the data: checkpoints, wandb logs, dataset...TARGET_MODEL_FOLDER
— name of the folder with Pytorch checkpoints of target model (specified in the previous step)TARGET_MODEL_WHITE_BOX_FOLDER
— name of the folder with checkpoint of the secret model (should be name of the folder insideTARGET_MODEL_FOLDER
)TARGET_MODEL_BLACK_BOX_FOLDER
— name of the folder with checkpoint of the adversarially trained model (should be name of the folder insideTARGET_MODEL_FOLDER
)
For example, the secret model should be accessed by {LOGS_PATH}/{TARGET_MODEL_FOLDER}/{TARGET_MODEL_WHITE_BOX_FOLDER}
Train AdvGAN model running
python train_adv_gan.py [--is-blackbox] [--is-distilled]
Flag --is-blackbox
points that AdvGAN would be trained on adversarially trained target model. Otherwise, it will be
trained on secret target model (the same as for evaluation).
Flag --is-distilled
points that AdvGAN would be trained on "student" model which is trained on the specified target
model. So the target model is more like an oracle (or teacher) which gives the labels, but AdvGAN doesn't have access to
the target model itself.
Example of the path to corresponding checkpoint is {LOGS_PATH}/{ADV_GAN_FOLDER}/blackbox/not_distilled
It will take few hours on CPU, so we highly recommend running it on GPU. Sometimes, when running on GPU, there is a
problem with wandb logging. Running this command should help:
module load eth_proxy
Train APE-GAN model running
python train_ape_gan.py [--adv-model=adv_gan] [--is-blackbox] [--is-distilled]
Parameter --adv-model
corresponds to the attack model, from which APE-GAN tries to defend. Currently supported values
here are adv_gan, fgsm and pgd. We use implementation of FGSM with CE loss. PGD uses 40 iterations and is quite slow
Flag --is-blackbox
points at type of the target model.
Flag --is-distilled
only matters for --adv-model=adv_gan
Example of the path to corresponding checkpoint is {LOGS_PATH}/{APE_GAN_FOLDER}/adv_gan_whitebox_not_distilled
. Note
that name of the folder corresponds to the attack model. Also, if you ran with adv-model=adv_gan
, make sure that such
version of AdvGAN (with specified blackbox and distilled flags) was trained on the previous step. This model we also
highly recommend training on GPU.
python evaluate_attack.py
It has many parameters, so we added help for this script, however it's really easy to use. Example:
python evaluate_attack.py --adv-model=adv_gan --attack-is-blackbox --def-attack-model=fgsm --def-attack-is-distilled
This script evaluates performance of AdvGAN which was trained using adversarially trained model (not a secret one). It
also evaluates performance of APE-GAN against this AdvGAN attack while it was trained on fgsm attack with a secret model
as target. The last flag (--def-attack-is-distilled
) is ignored since distilled model is only supported for AdvGAN in
our implementation.
To look at the charts or at the images, we use wandb. We highly recommend using it because it
provides real-time plots and images (adversarial and restored) during the training. Simply sign up there, login in the
first running and access it! If you don't want to use this feature, pass --no-wandb
to train_[adv|ape]_gan.py
.