/TMM

Primary LanguagePython

Transferable Multimodal Attack on Vision-Language Pre-training Models

This is the official PyTorch implementation of the paper "Transferable Multimodal Attack on Vision-Language Pre-training Models".

Requirements

  • pytorch 1.10.2
  • transformers 4.8.1
  • timm 0.4.9
  • bert_score 0.3.11

Download

Attack Multimodal Embedding

python EvalTransferAttack.py --adv 1 --gpu 0 \
--config ./configs/Retrieval_flickr.yaml \
--output_dir ./output/Retrieval_flickr \
--checkpoint [Finetuned checkpoint]
--log_name [log_name]
--save_json_name [save_json_name]
--config_name [config_name]
--save_dir [save_dir]