/contrastive-poisoning

[ICLR 2023, Spotlight] Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning

Primary LanguagePythonMIT LicenseMIT

Contrastive Poisoning

Project Page | Paper | BibTex

This repo contains the official PyTorch implementation of Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning (ICLR 2023 Spotlight), by Hao He*, Kaiwen Zha*, Dina Katabi (*co-primary authors).

Setup

  • Install dependencies using conda:

    conda create -n contrastive-poisoning python=3.7
    conda activate contrastive-poisoning
    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
    pip install tensorboard
    pip install pillow==9.0
    pip install gdown
    
    git clone --recursive https://github.com/kaiwenzha/contrastive-poisoning.git
    cd kornia_pil
    pip install -e .

    In this work, we implemented PIL-based differentiable data augmentations (to match PIL-based torchvision data augmentations) based on kornia, an OpenCV-based differentiable computer vision library.

  • Download datasets (CIFAR-10, CIFAR-100):

    source download_cifar.sh
  • Download all of our pretrained poisons (shown in the table below):

    gdown https://drive.google.com/drive/folders/1FeIHf_tD1bL776Q0PHWGI_rcAkmvQ2iE\?usp\=share_link --folder

Pretrained Poisons

CIFAR-10

Attacker Type Victim's Algorithm
SimCLR MoCo v2 BYOL
CP-S 44.9 / poison 55.1 / poison 59.6 / poison
CP-C 68.0 / poison 61.9 / poison 56.9 / poison

The results in the table above assume the victim's algorithm being known to the attacker, i.e., the attacker and the victim are using the same CL algorithm.

BYOL performance may slightly differ from what is reported in the table/paper above because we have replaced the implementation of synchronized batch normalization from the previous apex.parallel.SyncBatchNorm (now deprecated) to torch.nn.SyncBatchNorm when releasing the code.

To evaluate our pretrained poisons, re-train the corresponding CL model on the poisoned dataset by running

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 main.py \
      --dataset cifar10 \
      --arch resnet18 \
      --cl_alg [SimCLR/MoCov2/BYOL] \
     [--classwise or --samplewise] \
      --delta_weight $[8./255] \
      --folder_name eval_poisons \
      --epochs 1000 \
      --eval_freq 100 \
      --pretrained_delta pretrained_poisons/xxx.pth

Set arguments --cl_alg, --classwise or --samplewise, and --pretrained_delta according to the evaluated poison you choose before running. Taking the SimCLR CP-S poison (cifar10_res18_simclr_cps.pth) as an example, the running script should set --cl_alg SimCLR, --samplewise, and --pretrained_delta pretrained_poisons/cifar10_res18_simclr_cps.pth.

Training

This code supports training on CIFAR-10 and CIFAR-100.

Contrastive Learning Baselines

To train a contrastive learning (CL) model (e.g., SimCLR, MoCov2, BYOL) on the clean dataset, run

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 main.py \
      --dataset cifar10 \
      --arch resnet18 \
      --cl_alg [SimCLR/MoCov2/BYOL] \
      --folder_name baseline \
      --baseline \
      --epochs 1000 \
      --eval_freq 100

Class-wise Contrastive Poisoning (CP-C)

  1. Run CP-C to generate the class-wise poison

    CUDA_VISIBLE_DEVICES=0,1,2,3 \
    python -m torch.distributed.launch --nproc_per_node=4 main.py \
          --dataset cifar10 \
          --arch resnet18 \
          --cl_alg [SimCLR/MoCov2/BYOL] \
          --classwise \
          --delta_weight $[8./255] \
          --folder_name CP_C \
          --epochs 1000 \
          --eval_freq 10000 \
          --print_freq 5 \
          --num_steps 1 \
          --step_size 0.1 \
          --model_step 20 \
          --noise_step 20 \
         [--allow_mmt_grad]

    Add --allow_mmt_grad flag to enable dual-branch propagation when running on MoCov2 and BYOL.

  2. Re-train the CL model (e.g., SimCLR, MoCov2, BYOL) on the poisoned dataset generated by CP-C

    CUDA_VISIBLE_DEVICES=0,1,2,3 \
    python -m torch.distributed.launch --nproc_per_node=4 main.py \
          --dataset cifar10 \
          --arch resnet18 \
          --cl_alg [SimCLR/MoCov2/BYOL] \
          --classwise \
          --delta_weight $[8./255] \
          --folder_name CP_C \
          --epochs 1000 \
          --eval_freq 100 \
          --pretrained_delta <.../last.pth>

    --pretrained_delta is the path to the model checkpoint from step 1, which contains the generated poison.

Sample-wise Contrastive Poisoning (CP-S)

  1. Run CP-S to generate the sample-wise poison

    CUDA_VISIBLE_DEVICES=0,1,2,3 \
    python -m torch.distributed.launch --nproc_per_node=4 main.py \
          --dataset cifar10 \
          --arch resnet18 \
          --cl_alg [SimCLR/MoCov2/BYOL] \
          --samplewise \
          --delta_weight $[8./255] \
          --folder_name CP_S \
          --epochs 200 \
          --eval_freq 10000 \
          --num_steps 5 \
          --step_size 0.1 \
          --initialized_delta <.../last.pth or pretrained_poisons/cifar10_res18_xxx_cpc.pth> \
         [--allow_mmt_grad]
    • To get a stronger poison, here we use learned class-wise poison to initialize the sample-wise poison. --initialized_delta can either be set as the path to the model checkpoint trained by CP-C step 1, or use our generated CP-C poison in pretrained_poisons folder (Note: the CL algorithm should be matched).
    • Add --allow_mmt_grad flag to enable dual-branch propagation when running on MoCov2 and BYOL.
  2. Re-train the CL model (e.g., SimCLR, MoCov2, BYOL) on the poisoned dataset generated by CP-S

    CUDA_VISIBLE_DEVICES=0,1,2,3 \
    python -m torch.distributed.launch --nproc_per_node=4 main.py \
          --dataset cifar10 \
          --arch resnet18 \
          --cl_alg [SimCLR/MoCov2/BYOL] \
          --samplewise \
          --delta_weight $[8./255] \
          --folder_name CP_S \
          --epochs 1000 \
          --eval_freq 100 \
          --pretrained_delta <.../last.pth> (for MoCov2 and BYOL) or <.../ckpt_epoch_160.pth> (for SimCLR)

    --pretrained_delta is the path to the model checkpoint from step 1, which contains the generated poison.

Model Resuming

To resume any interrupted model trained above, keep all commands unchanged and simply add --resume <.../curr_last.pth>, which should specify the full path to the latest checkpoint (curr_last.pth) of the interrupted model.

Acknowledgements

This code is partly based on the open-source implementations from SupContrast, MoCo, lightly and kornia.

Citation

If you use this code for your research, please cite our paper:

@inproceedings{he2023indiscriminate,
    title={Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning},
    author={Hao He and Kaiwen Zha and Dina Katabi},
    booktitle={The Eleventh International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=f0a_dWEYg-Td}
}