This is a pytorch implementation of Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks.
Abstract: *In this work, we study the black-box targeted attack problem from the model discrepancy perspective. On the theoretical side, we present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack. We reveal that the attack error on a target model mainly depends on empirical attack error on the substitute model and the maximum model discrepancy among substitute models. On the algorithmic side, we derive a new algorithm for black-box targeted attacks based on our theoretical analysis, in which we additionally minimize the maximum model discrepancy(M3D) of the substitute models when training the generator to generate adversarial examples. In this way, our model is capable of crafting highly transferable adversarial examples that are robust to the model variation, thus improving the success rate for attacking the black-box model. We conduct extensive experiments on the ImageNet dataset with different classification models, and our proposed approach outperforms existing state-of-the-art methods by a significant margin. *
If you find our work, this repository and pretrained adversarial generators useful. Please consider giving a star ⭐ and cite our work.
@inproceedings{zhao2023minimizing,
title={Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks},
author={Zhao, Anqi and Chu, Tong and Liu, Yahao and Li, Wen and Li, Jingjing and Duan, Lixin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8153--8162},
year={2023}
}
- Contributions
- Acknowledge
- Pretrained Targeted Generator
- Installation
- Training
- The Implementation Details of Discriminators
- Evaluation
- We present a generalization error bound for black-box targeted attacks based on the model discrepancy perspective.
- We design a novel generative approach called Minimizing Maximum Model Discrepancy (M3D) attack to craft adversarial examples with high transferability based on the generalization error bound.
- We demonstrate the effectiveness of our method by strong empirical results, where our approach outperforms the state-of-art methods by a significant margin.
(top) Code adapted from TTP. We thank them for their wonderful code base.
(top) If you find our pretrained Adversarial Generators useful, please consider citing our work.
Class to Label Mapping
Class Number: Class Name
24: Great Grey Owl
99: Goose
245: French Bulldog
344: Hippopotamus
471: Cannon
555: Fire Engine
661: Model T
701: Parachute
802: Snowmobile
919: Street Sign
This is how the pretrianed generators are saved: _"netG_Discriminator_epoch_targetLabel.pth" e.g., netG_vgg19_bn_9_919.pth means that generator is trained agisnt vgg19_bn (Discriminator) for 10 epoch by M3D and the target label is 919(Street Sign).
Source Model | 24 | 99 | 245 | 344 | 471 | 555 | 661 | 701 | 802 | 919 |
---|---|---|---|---|---|---|---|---|---|---|
VGG19_BN | Grey Owl | Goose | French Bulldog | Hippopotamus | Cannon | Fire Engine | Model T | Parachute | Snowmobile | Street Sign |
ResNet50 | Grey Owl | Goose | French Bulldog | Hippopotamus | Cannon | Fire Engine | Model T | Parachute | Snowmobile | Street Sign |
Dense121 | Grey Owl | Goose | French Bulldog | Hippopotamus | Cannon | Fire Engine | Model T | Parachute | Snowmobile | Street Sign |
conda create --name M3D -y python=3.7.0
conda activate M3D
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
#apex
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Run the following command to train a generator:
CUDA_VISIBLE_DEVICES=0,1,2 nohup python3 -m torch.distributed.launch --nproc_per_node=3 --master_port 23411 train_M3D.py --gs --match_target 802 --batch_size 16 --epochs 10 --model_type vgg19_bn --log_dir ./checkpoint/vgg19_bn_M3D_3gpu/ --save_dir ./checkpoint/vgg19_bn_M3D_3gpu --apex_train 1 > ./checkpoint/vgg19_bn_M3D_3gpu/output_802.txt
Note that since the training data input to D1 and D2 are the same, the two models need to be initialized slightly different to ensure the model discrepancy loss works. We simply use a pre-trained model for one discriminator, and a model fine-tuned for one batch using ImageNet training data for another discriminator. We saved the finetuned models in 'pretrain_save_models' for convenience here. You can also finetune the discriminator during the training period!
Run the following command to evaluate transferability of the 10 targets to black-box model on the ImageNet-Val.
CUDA_VISIBLE_DEVICES=0 python eval_M3D.py --source_model resnet50 --target_model densenet121
Suggestions and questions are welcome!