/BlackVIP

Official implementation for CVPR'23 paper "BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning"

Primary LanguagePython

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning

We provide the official PyTorch Implementation of 'BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning' (CVPR 2023)

Changdae Oh, Hyeji Hwang, Hee-young Lee, YongTaek Lim, Geunyoung Jung, Jiyoung Jung, Hosik Choi, and Kyungwoo Song


Abstract

With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements.


Research Highlights

  • Input-Dependent Dynamic Visual Prompting: To our best knowledge, this is the first paper that explores the input-dependent visual prompting on black-box settings. For this, we devise Coordinator, which reparameterizes the prompt as an autoencoder to handle the input-dependent prompt with tiny parameters.
  • New Algorithm for Black-Box Optimization: We propose a new zeroth-order optimization algorithm, SPSA-GC, that gives look-ahead corrections to the SPSA's estimated gradient resulting in boosted performance.
  • End-to-End Black-Box Visual Prompting: By equipping Coordinator and SPSA-GC, BlackVIP adapts the PTM to downstream tasks without parameter access and large memory capacity.
  • Empirical Results: We extensively validate BlackVIP on 16 datasets and demonstrate its effectiveness regarding few-shot adaptability and robustness on distribution/object-location shift.


Coverage of this repository

Methods

  • BlackVIP (Ours)
  • BAR
  • VP (with our SPSA-GC)
  • VP
  • Zero-Shot Inference

Experiments

  • main performance (Tab. 2 and Tab. 3 of paper)
    • two synthetic datasets - [Biased MNIST, Loc-MNIST]
    • 14 transfer learning benchmarks - [Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, SVHN, EuroSAT, Resisc45, CLEVR, UCF101, ImageNet]
  • ablation study (Tab. 5 and Tab. 6 of paper)
    • varying architectures (coordinator, target model)
    • varying coordinator weights and optimizers

Setup

  • Run the following commands to create the environment.
    • Note that we slightly modifed the Dassl.pytorch to my_dassl for flexible experiments.
# Clone this repo
git clone https://github.com/changdaeoh/BlackVIP.git
cd BlackVIP

# Create a conda environment
conda create -y -n blackvip python=3.8

# Activate the environment
conda activate blackvip

# Install torch and torchvision
# Please refer to https://pytorch.org/ if you need a different cuda version
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.6 -c pytorch -c conda-forge

# Install dependencies
cd my_dassl
pip install -r requirements.txt

# Install additional requirements
cd ..
pip install -r requirements.txt

Data preparation

  • To prepare following 11 datasets (adopted by CoOp), please follow the instruction from https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md
    • Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, EuroSAT, UCF101, and ImageNet
    • We use the same few-shot split of CoOp for above 11 datasets.
  • To prepare following three datasets (adopted by VP), the instructions are below:
  • To prepare our synthetic dataset -LocMNIST-, run /datasets/mk_locmnist.py as python mk_locmnist.py --data_root [YOUR-DATAPATH] --f_size [1 or 4]
  • For Biased MNIST, no precedures are required.

Run

transfer learning benchmarks

  • Move to BlackVIP/scripts/method_name directory
  • Across 14 benchmark datasets and four methods, you can refer this docs containing the hyperparameter table
  • On the targeted dataset, run the commands with dataset-specific configs as below:
# for BlackVIP, specify {1:dataset, 2:epoch, 3:moms, 4:spsa_gamma, 5:spsa_c, 6:p_eps}
sh tl_bench.sh svhn 5000 0.9 0.2 0.005 1.0

# for BAR, specify {1:dataset, 2:epoch, 3:init_lr, 4:min_lr}
sh tl_bench.sh svhn 5000 5.0 0.1

# for VP w/ SPSA-GC, specify {1:dataset, 2:epoch, 3:moms, 4:spsa_a, 5:spsa_c}
sh tl_bench.sh svhn 5000 0.9 10.0 0.01

# for VP (white-box), specify {1:dataset, 2:epoch, 3:lr}
sh tl_bench.sh svhn 1000 40.0

# for Zero-shot CLIP inference, move to 'BlackVIP/scripts/coop' and run:
sh zeroshot_all.sh

synthetic datasets

  • In BlackVIP/scripts/method_name/, there are three files to reproduce the results of Biased MNIST and Loc-MNIST: synthetic_bm_easy.sh, synthetic_bm_hard.sh, and synthetic_lm.sh
# for BlackVIP on Loc-MNIST, specify {1:fake-digit-size, 2:moms, 3:spsa_alpha, 4:spsa_a, 5:spsa_c}
sh synthetic_lm.sh 1 0.9 0.5 0.01 0.005  # 1:1 setting
sh synthetic_lm.sh 4 0.95 0.5 0.02 0.01  # 1:4 seeting

# for BlackVIP on Biased MNIST, specify {1:moms, 2:spsa_alpha, 3:spsa_a, 4:spsa_c}
sh synthetic_bm_easy.sh 0.9 0.4 0.01 0.01  # spurious correlation = 0.8
sh synthetic_bm_hard.sh 0.9 0.4 0.01 0.01  # spurious correlation = 0.9

# other methods can be runned similarly to the above.

ablation study

# for BlackVIP, specify {1:target_backbone, 2:spsa_alpha, 3:moms, 4:spsa_gamma, 5:spsa_c, 6:p_eps}
sh ablation_arch_rn.sh rn50 0.5 0.9 0.2 0.01 0.3


Contact

For any questions, discussions, and proposals, please contact to changdae.oh@uos.ac.kr or kyungwoo.song@gmail.com


Citation

If you use our code in your research, please kindly consider citing:

@inproceedings{oh2023blackvip,
  title={BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning},
  author={Oh, Changdae and Hwang, Hyeji and Lee, Hee-young and Lim, YongTaek, and Jung, Geunyoung, and Jung, Jiyoung, and Choi, Hosik, and Song, Kyungwoo},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

Acknowledgements

Our overall experimental pipeline is based on CoOp, CoCoOp repository. For baseline construction, we bollowed/refered the code from repositories of VP, BAR, and AR. We appreciate the authors (Zhou et al., Bahng et al., Tsai et al.) and Savan for sharing their code.