Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

This repository is the official implementation of Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples.

We propose two effective backdoor defense methods for training a secure model from scratch (D-ST) and removing backdoor from the backdoored model (D-BR), respectively. D-ST consists of two modules: the SD module and the ST module. D-BR includes two modules: the SD module and the BR module. SD module distinguishes samples according to the FCT metric and separates training samples into clean samples, poisoned samples and uncertain samples. ST module firstly trains a secure feature extractor via semi-supervised contrastive learning and then trains a secure classifier by minimizing a mixed cross-entropy loss. BR module removes backdoor by unlearning the distinguished poisoned samples and relearning the distinguished clean samples.

Environment

This code is implemented in PyTorch, and we have tested the code under the following environment settings:

  • python = 3.8.8
  • torch = 1.7.1
  • torchvision = 0.8.2
  • tensorflow = 2.4.1

Default Configurations

The default configurations are as follows:

  • dataset = cifar10
  • model = resnet18
  • poison_rate = 0.10
  • target_type = all2one
  • trigger_type = gridTrigger

You might change the configurations to apply our method in different settings. Note that the attacks used in our paper correspond to the following configurations:

Attack target_type trigger_type
BadNets-all2one all2one gridTrigger
BadNets-all2all all2all squareTrigger
Trojan all2one trojanTrigger
Blend-Strip all2one signalTrigger
Blend-Kitty all2one kittyTrigger
SIG cleanLabel sigTrigger
CL cleanLabel fourCornerTrigger
SSBA all2one SSBA
BadNets-all2one (on ImageNet) all2one squareTrigger_imagenet
Blend-Strip (on ImageNet) all2one signalTrigger_imagenet

Performance

Results on CIFAR-10:

CIFAR-10 Backdoored D-BR D-ST
Attack ACC ASR ACC ASR ACC ASR
BadNets-all2one 91.64 100.00 92.83 0.40 92.77 0.03
BadNets-all2all 92.79 88.01 92.61 0.56 89.22 2.05
Trojan 91.91 100.00 92.21 0.76 93.72 0.00
Blend-Strip 92.09 99.97 92.40 0.06 93.59 0.00
Blend-Kitty 92.69 99.99 92.11 0.14 91.82 0.00
SIG 92.88 99.69 92.73 0.24 90.07 0.00
CL 93.20 93.34 92.08 0.00 90.46 6.40

Results on CIFAR-100:

CIFAR-100 Backdoored D-BR D-ST
Attack ACC ASR ACC ASR ACC ASR
BadNets-all2one 71.23 99.13 72.58 0.25 68.43 0.12
Trojan 75.75 100.00 74.52 0.00 68.04 0.08
Blend-Strip 75.54 99.99 74.35 0.00 67.63 0.00
Blend-Kitty 75.18 99.97 72.00 0.01 67.06 0.00

Results on ImageNet Subset:

ImageNet Backdoored D-BR
Attack ACC ASR ACC ASR
BadNets-all2one 84.72 95.80 83.66 0.00
Blend-Strip 84.36 97.64 80.40 0.00
Blend-Kitty 85.46 99.68 84.29 0.00
SSBA 85.24 99.64 83.77 0.09

Sample-Distinguishment (SD) Module

Step1: Train a backdoored model without any data augmentations.

python train_attack_noTrans.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --epochs 2

Models of each epoch are saved at "./saved/backdoored_model/poison_rate_0.1/noTrans/cifar10/resnet18/gridTrigger".

Step2: Fine-tune the backdoored model with intra-class loss $\mathcal{L}_{intra}$.

python finetune_attack_noTrans.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --epochs 10 --checkpoint_load ./saved/backdoored_model/poison_rate_0.1/noTrans/cifar10/resnet18/gridTrigger/1.tar

This step aims to enlarge the distance between genuinely clean samples with target class and genuinely poisoned samples.

--checkpoint_load specifies the path of the backdoored model.

Models of each epoch are saved at "./saved/backdoored_model/poison_rate_0.1/noTrans_ftsimi/cifar10/resnet18/gridTrigger".

Step3: Calculate the values of the FCT metric ($\Delta_{trans}(x;\tau,f)$) for all training samples.

python calculate_consistency.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --checkpoint_load ./saved/backdoored_model/poison_rate_0.1/noTrans_ftsimi/cifar10/resnet18/gridTrigger/9.tar

--checkpoint_load specifies the path of the fine-tuned model.

$\Delta_{trans}(x;\tau,f)$ of all training samples are saved at "./saved/backdoored_model/poison_rate_0.1/noTrans_ftsimi/cifar10/resnet18/gridTrigger/9_all.txt". Besides, $\Delta_{trans}(x,\tau,f)$ of genuinely clean samples are saved as "9_clean.txt", while those of genuinely poisoned samples are saved as "9_poison.txt". They are used in the following visualization step.

If you want to visualize values of the FCT metric, you can run:

python visualize_consistency.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --checkpoint_load ./saved/backdoored_model/poison_rate_0.1/noTrans_ftsimi/cifar10/resnet18/gridTrigger/9.tar

The resulting figure is saved as "9.jpg" and shown as:

Step4: Calculate thresholds for choosing clean and poisoned samples.

python calculate_gamma.py --clean_ratio 0.20 --poison_ratio 0.05 --checkpoint_load ./saved/backdoored_model/poison_rate_0.1/noTrans_ftsimi/cifar10/resnet18/gridTrigger/9.tar 

In this step, you obtain two values, $\gamma_c$ and $\gamma_p$, which are used in the next step as gamma_low and gamma_high, respectively.

--clean_ratio and --poison_ratio specify $\alpha_c$ and $\alpha_p$, respectively. --checkpoint_load specifies the path of the fine-tuned model.

Step5: Separate training samples into clean samples $\hat{D}_c$, poisoned samples $\hat{D}_p$ and uncertain samples $\hat{D}_u$

python separate_samples.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --batch_size 1 --clean_ratio 0.20 --poison_ratio 0.05 --gamma_low 0.0 --gamma_high 19.71682357788086 --checkpoint_load ./saved/backdoored_model/poison_rate_0.1/noTrans_ftsimi/cifar10/resnet18/gridTrigger/9.tar

--gamma_low and --gamma_high specify $\gamma_c$ and $\gamma_p$, respectively, which are derived from the former step. --checkpoint_load specifies the path of the fine-tuned model.

The separated samples are saved at "./saved/separated_samples/cifar10/resnet18/gridTrigger_0.2_0.05". Specifically, $\hat{D}_c$ is saved as "clean_samples.npy". $\hat{D}_p$ is saved as "poison_samples.npy". $\hat{D}_u$ is saved as "suspicous_samples.npy".

two-stage Secure Training (ST) Module

cd ST

Step1: Train the feature extractor via semi-supervised contrastive learning.

python train_extractor.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --epochs 200 --learning_rate 0.5 --temp 0.1 --cosine --save_freq 20 --batch_size 512

Parameters are set as the same in Supervised Contrastive Learning (https://github.com/HobbitLong/SupContrast).

Checkpoints are saved at "./save/poison_rate_0.1/SupCon_models/cifar10/resnet18/gridTrigger_0.2_0.05/SupCon_cifar10_resnet18_lr_0.5_decay_0.0001_bsz_512_temp_0.1_trial_0_cosine_warm".

Step2: Train the classifier via minimizing a mixed cross-entropy loss.

python train_classifier.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --epochs 10 --learning_rate 5 --batch_size 512 --ckpt ./save/poison_rate_0.1/SupCon_models/cifar10/resnet18/gridTrigger/SupCon_cifar10_resnet18_lr_0.5_decay_0.0001_bsz_512_temp_0.1_trial_0_cosine_warm/last.pth

Parameters are set as the same in Supervised Contrastive Learning. --ckpt specifies the path of the trained feature extractor.

Checkpoints are saved at "./save/poison_rate_0.1/SupCon_models/cifar10/resnet18/gridTrigger_0.2_0.05/Linear_cifar10_resnet18_lr_5.0_decay_0_bsz_512".

Step3: Test the final model.

python test.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --model_ckpt ./save/poison_rate_0.1/SupCon_models/cifar10/resnet18/gridTrigger/SupCon_cifar10_resnet18_lr_0.5_decay_0.0001_bsz_512_temp_0.1_trial_0_cosine_warm/last.pth --classifier_ckpt ./save/poison_rate_0.1/SupCon_models/cifar10/resnet18/gridTrigger/Linear_cifar10_resnet18_lr_5.0_decay_0_bsz_512/ckpt_epoch_9.pth

--model_ckpt and --classifier_ckpt specify the path of the trained feature extractor and classifier, respectively.

Backdoor Removal (BR) Module

Step1: Train a backdoored model with classical data augmentations.

If you use cifar10 or cifar100 as the dataset, please run the following command.

python train_attack_withTrans.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --epochs 200

Models of each epoch are saved at "./saved/backdoored_model/poison_rate_0.1/withTrans/cifar10/resnet18/gridTrigger". Note that you can refer to https://github.com/weiaicunzai/pytorch-cifar100 if you want the accuracy of the model trained on cifar100 is higher.

If you use imagenet as the dataset, please run the following command.

python train_attack_withTrans_imagenet.py --dataset imagenet --model resnet18 --trigger_type squareTrigger_imagenet --epochs 25 --lr 0.001 --gamma 0.1 --schedule 15 20

Models of each epoch are saved at "./saved/backdoored_model/poison_rate_0.1/withTrans/imagenet/resnet18/squareTrigger_imagenet".

Step2: Unlearn and relearn the backdoored model.

python unlearn_relearn.py --dataset cifar10 --model resnet18 --trigger_type gridTrigger --epochs 20 ---clean_ratio 0.20 --poison_ratio 0.05 -checkpoint_load ./saved/backdoored_model/poison_rate_0.1/withTrans/cifar10/resnet18/gridTrigger/199.tar --checkpoint_save ./saved/backdoored_model/poison_rate_0.1/withTrans/cifar10/resnet18/gridTrigger/199_unlearn_purify.py --log ./saved/backdoored_model/poison_rate_0.1/withTrans/cifar10/resnet18/gridTrigger/unlearn_purify.csv

This step unlearns $D_p$ and relearns $D_c$.

--checkpoint_load specifies the path of the backdoored model. --checkpoint_save specifies the path to save the model. --log specifies the path to record.

Implementation Tips for Some Attacks

CL Attack

If you want to run the CL attack, you need to train a clean model first. Please use the following command.

python train_clean_withTrans.py --dataset cifar10 --model resnet18 --epochs 200

Models of each epoch are saved at "./saved/benign_model/cifar10/resnet18".

SSBA Attack

If you want to run the SSBA attack, please use the following annotated code in dataloader_bd.py.

from utils.SSBA.encode_image import bd_generator

You also need to refer to https://github.com/SCLBD/ISSBA to download the encoder for imagenet and save it at "./trigger/imagenet_encoder/saved_model.pb".