dataset list: TODO: STL10, Caltech101, Stanford Cars, SUN397, Caltech 101, ImageNet,
CUDA_VISIBLE_DEVICES=5,6 python main_clip_addprompttoken.py --dataset cifar100 --root /local/vondrick/chengzhi/datasets --add_prompt_size 0
Find the visual prompt is fiting to the one hot embed, not the text. Which will cause overfit to one hot and lost ability to be zero shot.
This is the official implementation of the paper Exploring Visual Prompts for Adapting Large-Scale Models.
Clone this repo:
git clone https://github.com/hjbahng/visual_prompting.git
cd visual_prompting
This code requires python 3+. Install dependencies by:
pip install -r requirements.txt
Prepare the pre-trained models:
bash models/download_models.sh
- Train a visual prompt:
python main_clip.py --dataset cifar100 --root [path_to_cifar100]
- Test the visual prompt:
python main_clip.py --evaluate --resume /path/to/checkpoints/model_best.pth.tar --dataset cifar100 --root [path_to_cifar100]
- Train a visual prompt:
python main_vision.py --model bit_m_rn50 --dataset cifar100 --root [path_to_cifar100]
- Test the visual prompt:
python main_vision.py --evaluate --resume /path/to/checkpoints/model_best.pth.tar --model bit_m_rn50 --dataset cifar100 --root [path_to_cifar100]
- There are three model choices:
rn50
,instagram_resnext101_32x8d
, andbit_m_rn50
. - Note that we use
--batch_size 32
forinstagram_resnext101_32x8d
and--batch_size 128
for other models.
If you use this code for your research, please cite our paper.
@article{bahng2022visual,
title={Exploring Visual Prompts for Adapting Large-Scale Models},
author={Hyojin Bahng and Ali Jahanian and Swami Sankaranarayanan and Phillip Isola},
journal={arXiv preprint arXiv:2203.17274},
year={2022}
}