Availability poisoning is an emerging and popular topic that investigates stealthy alterations on data to make that data unusable for deep learning model training. Recent papers have proposed a number of usability poisoning attacks and defenses. It is important to establish a benchmark to review current progress and facilitate future research in the area of availability poisoning. APBench aims to provide easy implementations of availability poisoning attack & defense methods to facilitate future research as well as a comprehensive evaluation of existing attack and defense methods. We eagerly welcome you to contribute your availability poisoning attack & defense methods to APBench.
Paper (TMLR version) • Leaderboard
If you find this benchmark helpful for your research, please cite our paper:
@article{qin2024apbench,
title={{APB}ench: A Unified Availability Poisoning Attack and Defenses Benchmark},
author={Tianrui Qin and Xitong Gao and Juanjuan Zhao and Kejiang Ye and Cheng-zhong Xu},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2024},
url={https://openreview.net/forum?id=igJ2XPNYbJ},
}
Considering the black-box poisoning scenario, we additionally add 2 transformer-based models ViT and CaiT.
- vit_small.py: ViT-small model.
- cait_small.py: CaiT-small model.
APBench contains the following attacks and defenses:
Attacks:
- 11 availability poisoning attack methods: DeepConfuse, NTGA, EM, REM, HYPO, TAP, LSP, AR, OPS, UCL, TUE.
Defenses:
Datasets: CIFAR-10, CIFAR-100, SVHN, ImageNet-Subset (100).
Models: ResNet-18, ResNet-50, SENet-18, MobileNet-V2, DenseNet-121, Vit-small, CaiT-small.
You can run the following script to configurate necessary environment:
conda create -n apbench python=3.8
conda activate apbench
pip install -r requirements.txt
├── dataset
│ ├── <Dataset> # For clean dataset.
│ ├── <Type>_poisons # For poisoned dataset.
│ ├── <Supervised_type>_pure # Poisoned dataset for supervised learning.
│ └── <Unsupervised_type>_pure # Poisoned dataset for unsupervised learning.
│ └── <Arch> # Unsupervised arch: simclr and moco.
├── defense
│ └── diffusion # For defense AVATAR
│ └── pretrained
│ └── score_sde
│ └── checkpoint_8.pth # Pretrained diffusion model for CIFAR-10
└── log # contains checkpoints
└── <Dataset> # Dataset type. e.g. c10, c100, imagenet100, and unsupervised.
└── <Type> # Attack type. e.g. em, rem ...
You should download pretrained checkpoint checkpoint_8.pth from Guided-diffusion following the code structure.
Step 1: Generate poisoned datasets: For example, if you want to generate poisoned datasets of EM, you should run with a demo script below:.
python poisons_generate.py --type em --dataset <Dataset> --eps <Eps_bound>
The parameter choices for the above commands are as follows:
- --dataset
<Dataset>
:c10
,c100
,svhn
,imagenet100
. - --type
<Attack>
:ar
,dc
,em
,rem
,hypo
,tap
,lsp
,ntga
,ops
. P.S.em
,rem
andlsp
support [c10
,c100
,svhn
,imagenet100
];ops
andar
support [c10
,c100
,svhn
];dc
,hypo
,tap
andntga
support [c10
].
Step 2: Training on poisoned datasets: If you have already generated poisoned dataset, you can train the model with a demo script below:
python train.py --dataset <Dataset> --<Defense> --arch <Model_arch> --type <Attack>
The parameter choices for the above commands are as follows:
- --dataset
<Dataset>
:c10
,c100
,svhn
. - --
<Defense>
:nodefense
,cutout
,cutmix
,mixup
,mixup
,bdr
,gray
,jpeg
,gaussian
,ueraser
,at
. - --arch
<Model_arch>
:r18
,r50
,se18
,mv2
,de121
,vit
,cait
. - --type
<Attack>
:ar
,dc
,em
,rem
,hypo
,tap
,lsp
,ntga
,ops
.
The above process does not include the AVATAR defense method, if you need to implement AVATAR defense, follow the script below:
Step 1: Generate poisoned datasets: For AVATAR, you should also firstly generate poisoned datasets following the above script.
Step 2: Generate purified datasets: If you have already generated poisoned dataset, you can generate the purified dataset with script below:
python pure_gen.py --dataset <Dataset> --type <Attack>
Step 3: Training on purified datasets: Then, train the model on processed dataset with script below:
python train.py --pure --dataset <Dataset> --arch <Model_arch> --type <Attack>
The parameter choices for the above commands are as follows:
- --dataset
<Dataset>
:c10
,c100
,svhn
. - --arch
<Model_arch>
:r18
,r50
,se18
,mv2
,de121
,vit
,cait
. - --type
<Attack>
:ar
,dc
,em
,rem
,hypo
,tap
,lsp
,ntga
,ops
.
The trained checkpoints will be saved at log/<Dataset>/<Attack>/
.
You need to confirm that the target poisoned dataset has been generated in advance.
Attacks | File name |
---|---|
DeepConfuse | dc_poisons.py |
NTGA | ntga_poisons.py |
EM | em_poisons.py |
REM | rem_poisons.py |
HYPO | hypo_poisons.py |
TAP | tap_poisons.py |
LSP | lsp_poisons.py |
AR | ar_poisons.py |
OPS | ops_poisons.py |
Defenses | File name |
---|---|
AT | madrys.py |
ISS | - |
UEraser | ueraser.py |
AVATAR | diffpure.py |
You can refer to these codes and modify them according to your needs.
For unsupervised Methods, you can go to their repositories (UCL and TUE) to download the perturbations.pt. Then, you can train the unsupervised model with a demo script below:
python us_train.py --dataset <Dataset> --arch <Model_arch> --type <Attack>
The parameter choices for the above commands are as follows:
- --dataset
<Dataset>
:c10
andc100
. --<Defense>
:jpeg
andgray
.- --arch
<Model_arch>
:simclr
andmoco
. - --type
<Attack>
:ucl
andtue
.
For UEraser and AVATAR, you should firstly generate the processed dataset with script below:
python pure_us_gen.py --dataset <Dataset> --arch <Model_arch> --defense <Defense>
- --defense
<Defense>
:ueraser
for UEraser andpure
for AVATAR.
Then, you can train the unsupervised model on UEraser or AVATAR with a demo script below:
python us_train_pu.py --dataset <Dataset> --arch <Model_arch> --defense <Defense>
We use the pre-processed ImageNet-100 Download Link. You can also get ImageNet-100 by slicing ImageNet-1K (slight difference in sample size).