Create a conda environment and install dependencies:
conda create -y -n torch180 python=3.8
conda activate torch180
pip3 install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
pip install -r requirements.txt
Follow DATASETS.md to install the datasets used in the paper. Or run the following script(11 datasets, include ImageNet):
bash scripts/data.sh
Download the pretrained prompt from the link
And decompress it under the folder prompt_adapter/prompt_tensor_init
.
tar -xvf prompt_tensor_init.tar
The running configurations can be modified in configs/dataset.yaml
, including shot numbers, visual encoders, and hyperparamters.
For our evauation of 1shot, 2shots, 4shots, 8shots, 16shots, 20shots, YOU NEED to change the shots first and then running the follow script.
Note that the default load_cache
and load_pre_feat
are False
for the first running, which will store the cache model and val/test features in configs/dataset/
. For later running, they can be set as True
for faster hyperparamters tuning.
For ImageNet dataset:
python main_imagenet.py --config configs/imagenet.yaml
For other 10 datasets:
python main.py --config configs/oxford_pets.yaml
This repo benefits from Tip-Adapter and CoOp. Thanks for their wonderful works.
@article{sun2023prompt,
title={Prompt Tuning based Adapter for Vision-Language Model Adaption},
author={Sun, Jingchen and Qin, Jiayu and Lin, Zihao and Chen, Changyou},
journal={arXiv preprint arXiv:2303.15234},
year={2023}
}