/UAP_VLP

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

Primary LanguagePython

This is the PyTorch implementation of the paper "Universal Adversarial Perturbations for Vision-Language Pre-trained Models" at SIGIR 24.

Requirements

  • pytorch 1.10.2
  • transformers 4.8.1
  • timm 0.4.9
  • bert_score 0.3.11

Prepare datasets and models

Download the datasets, Flickr30k and MSCOCO (the annotations are provided in ./data_annotation/), and put them into ./Dataset. Set the root path of the dataset in ./configs/Retrieval_flickr.yaml, image_root.

The checkpoints of the fine-tuned VLP models are accessible in CLIP, ALBEF, TCL, BLIP, and put them into ./checkpoint.

Learn universal adversarial perturbations

Set paths of source/target model names and checkpoints, dataset names and roots, test file path, original_rank_index_path and so on in corresponding main files before running them.

# Learn UAPs by taking CLIP as the victim
python Attack_CLIP.py

# Learn UAPs by taking ALBEF/TCL as the victim 
python Attack_ALBEFTCL.py

Evaluation

Image-Text Retrieval

# Eval CLIP models:
python Eval_Retrieval_CLIP.py

# Eval ALBEF models:
python Eval_Retrieval_ALBEF.py

# Eval TCL models:
python Eval_Retrieval_TCL.py

Visual Grounding

Download Refcoco+ datasets from the origin website, and set 'image_root' in configs/Grounding.yaml accordingly.
# Eval:
python Eval_Grounding.py

Image Captioning

Download the MSCOCO dataset from the original websites, and set 'image_root' in configs/caption_coco.yaml accordingly.
# Eval:
python Eval_ImgCap_BLIP.py

Citation

If you find this code to be useful for your research, please consider citing our paper .

@inproceedings{zhang2024universal,
  title={Universal Adversarial Perturbations for Vision-Language Pre-trained Models},
  author={Zhang, Peng-Fei and Huang, Zi and Bai, Guangdong},
  booktitle={Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={862--871},
  year={2024}
}

Reference