This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.
We have already uploaded the all2one
pretrained backdoor student model(i.e. gridTrigger WRN-16-1, target label 5) and the clean teacher model(i.e. WRN-16-1) in the path of ./weight/s_net
and ./weight/t_net
respectively.
For evaluating the performance of NAD, you can easily run command:
$ python main.py
where the default parameters are shown in config.py
.
The trained model will be saved at the path weight/erasing_net/<s_name>.tar
Please carefully read the main.py
and configs.py
, then change the parameters for your experiment.
Dataset | Baseline ACC | Baseline ASR | NAD ACC | NAD ASR |
---|---|---|---|---|
CIFAR-10 | 85.65 | 100.0 | 82.12 | 3.57 |
We have provided a DatasetBD
Class in data_loader.py
for generating training set of different backdoor attacks.
For implementing backdoor attack(e.g. GridTrigger attack), you can run the below command:
$ python train_badnet.py
This command will train the backdoored model and print clean accuracies and attack rate. You can also select the other backdoor triggers reported in the paper.
Please carefully read the train_badnet.py
and configs.py
, then change the parameters for your experiment.
CL: Clean-label backdoor attacks
SIG: A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning
## reference code
def plant_sin_trigger(img, delta=20, f=6, debug=False):
"""
Implement paper:
> Barni, M., Kallas, K., & Tondi, B. (2019).
> A new Backdoor Attack in CNNs by training set corruption without label poisoning.
> arXiv preprint arXiv:1902.11237
superimposed sinusoidal backdoor signal with default parameters
"""
alpha = 0.2
img = np.float32(img)
pattern = np.zeros_like(img)
m = pattern.shape[1]
for i in range(img.shape[0]):
for j in range(img.shape[1]):
for k in range(img.shape[2]):
pattern[i, j] = delta * np.sin(2 * np.pi * j * f / m)
img = alpha * np.uint32(img) + (1 - alpha) * pattern
img = np.uint8(np.clip(img, 0, 255))
# if debug:
# cv2.imshow('planted image', img)
# cv2.waitKey()
return img
Refool: Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks
MCR: Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness
Fine-tuning & Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
STRIP: A Defence Against Trojan Attacks on Deep Neural Networks
Note
: TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classification in deep learning.
Backdoors 101 — is a PyTorch framework for state-of-the-art backdoor defenses and attacks on deep learning models.
If you find this code is useful for your research, please cite our paper
@inproceedings{li2021neural,
title={Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks},
author={Li, Yige and Lyu, Xixiang and Koren, Nodens and Lyu, Lingjuan and Li, Bo and Ma, Xingjun},
booktitle={ICLR},
year={2021}
}
If you have any questions, leave a message below with GitHub.