DFSP is a model which decomposes the prompt language feature into state feature and object feature, then fuses them with image feature to improve the response for state and object respectively.
conda create --name clip python=3.7
conda activate clip
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install git+https://github.com/openai/CLIP.git
Alternatively, you can use pip install -r requirements.txt
to install all the dependencies.
We experiment with three datasets: MIT-States, UT-Zappos, and C-GQA.
sh download_data.sh
If you already have setup the datasets, you can use symlink and ensure the following paths exist:
data/<dataset>
where <datasets> = {'mit-states', 'ut-zappos', 'cgqa'}
.
python -u train.py --dataset <dataset>
We evaluate our models in two settings: closed-world and open-world.
python -u test.py --dataset <dataset>
You can replace --dataset
with {mit-states, ut-zappos, cgqa}
.
For our open-world evaluation, we compute the feasbility calibration and then evaluate on the dataset.
We use GloVe embeddings to compute the similarities between objects and attributes.
Download the GloVe embeddings in the data
directory:
cd data
wget https://nlp.stanford.edu/data/glove.6B.zip
Move glove.6B.300d.txt
into data/glove.6B.300d.txt
.
To compute feasibility calibration for each dataset, run the following command:
python -u feasibility.py --dataset mit-states
The feasibility similarities are saved at data/feasibility_<dataset>.pt
.
To run, just edit the open-world parameter in config/.yml
If you use this code, please cite
@article{lu2022decomposed,
title={Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning},
author={Lu, Xiaocheng and Liu, Ziming and Guo, Song and Guo, Jingcai},
journal={arXiv preprint arXiv:2211.10681},
year={2022}
}