A Tale of Two Features explores the complementary nature of Stable Diffusion (SD) and DINOv2 features for zero-shot semantic correspondence. The results demonstrate that a simple fusion of the two features leads to state-of-the-art performance on the SPair-71k, PF-Pascal, and TSS datasets.
This repository is the official implementation of the paper:
A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa F. PolanĂa, Varun Jampani, Deqing Sun, Ming-Hsuan Yang arXiv preprint, 2023.
- Project Page (with additional visual results)
- arXiv Page
To install the required dependencies, use the following commands:
conda create -n sd-dino python=3.9
conda activate sd-dino
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone git@github.com:Junyi42/sd-dino.git
cd sd-dino
pip install -e .
(Optional) You may also want to install xformers for efficient transformer implementation:
pip install xformers==0.0.16
We provide the scripts to download the datasets in the data
folder. To download specific datasets, use the following commands:
- SPair-71k:
bash data/prepare_spair.sh
- PF-Pascal:
bash data/prepare_pfpascal.sh
- TSS:
bash data/prepare_tss.sh
Run pck_spair_pascal.py file:
python pck_spair_pascal.py --SAMPLE 20
Note that the SAMPLE
is the number of sampled pairs for each category, which is set to 20 by default. Set to 0
to use all the samples (settings in the paper).
Additional important parameters in pck_spair_pascal.py include:
--NOT_FUSE
: if set to True, only use the SD feature.--ONLY_DINO
: if set to True, only use the DINO feature.--DRAW_DENSE
: if set to True, draw the dense correspondence map.--DRAW_SWAP
: if set to True, draw the object swapping result.--DRAW_GIF
: if set to True, draw the object swapping result as a gif.--TOTAL_SAVE_RESULT
: number of samples to save the qualitative results, set to 0 to disable and accelerate the evaluation process.
Please refer to the pck_spair_pascal.py file for more details. You may find samples of qualitative results in the results_spair
folder.
Run pck_spair_pascal.py file:
python pck_spair_pascal.py --PASCAL
You may find samples of qualitative results in the results_pascal
folder.
Run pck_tss.py file:
python pck_tss.py
You may find samples of qualitative results in the results_tss
folder.
To extract the fused features of the input pair images and visualize the correspondence, please check the notebook demo_vis_features.ipynb for more details.
To swap the objects in the input pair images, please check the notebook demo_swap.ipynb for more details.
TODO
If you find our work useful, please cite:
@article{zhang2023tale,
title={{A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence}},
author={Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Cabrera, Luisa Polania and Jampani, Varun and Sun, Deqing and Yang, Ming-Hsuan},
journal={arXiv preprint arxiv:2305.15347},
year={2023}
}
Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), DenseMatching, and ncnet. Our heartfelt gratitude goes to the developers of these resources!