A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

A Tale of Two Features explores the complementary nature of Stable Diffusion (SD) and DINOv2 features for zero-shot semantic correspondence. The results demonstrate that a simple fusion of the two features leads to state-of-the-art performance on the SPair-71k, PF-Pascal, and TSS datasets.

This repository is the official implementation of the paper:

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa F. Polanía, Varun Jampani, Deqing Sun, Ming-Hsuan Yang arXiv preprint, 2023.

Visual Results

Dense Correspondence

Object Swapping

Object Swapping (with refinement process)

Environment Setup

To install the required dependencies, use the following commands:

conda create -n sd-dino python=3.9
conda activate sd-dino
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone git@github.com:Junyi42/sd-dino.git 
cd sd-dino
pip install -e .

(Optional) You may also want to install xformers for efficient transformer implementation:

pip install xformers==0.0.16

Get Started

Prepare the data

We provide the scripts to download the datasets in the data folder. To download specific datasets, use the following commands:

SPair-71k:

bash data/prepare_spair.sh

PF-Pascal:

bash data/prepare_pfpascal.sh

TSS:

bash data/prepare_tss.sh

Evaluate the PCK Results of SPair-71k

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --SAMPLE 20

Note that the SAMPLE is the number of sampled pairs for each category, which is set to 20 by default. Set to 0 to use all the samples (settings in the paper).

Additional important parameters in pck_spair_pascal.py include:

--NOT_FUSE: if set to True, only use the SD feature.
--ONLY_DINO: if set to True, only use the DINO feature.
--DRAW_DENSE: if set to True, draw the dense correspondence map.
--DRAW_SWAP: if set to True, draw the object swapping result.
--DRAW_GIF: if set to True, draw the object swapping result as a gif.
--TOTAL_SAVE_RESULT: number of samples to save the qualitative results, set to 0 to disable and accelerate the evaluation process.

Please refer to the pck_spair_pascal.py file for more details. You may find samples of qualitative results in the results_spair folder.

Evaluate the PCK Results of PF-Pascal

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --PASCAL

You may find samples of qualitative results in the results_pascal folder.

Evaluate the PCK Results of TSS

Run pck_tss.py file:

python pck_tss.py

You may find samples of qualitative results in the results_tss folder.

Demo

PCA / K-means Visualization of the Features

To extract the fused features of the input pair images and visualize the correspondence, please check the notebook demo_vis_features.ipynb for more details.

Quick Try on the Object Swapping

To swap the objects in the input pair images, please check the notebook demo_swap.ipynb for more details.

Refine the Result

TODO

Citation

If you find our work useful, please cite:

@article{zhang2023tale,
  title={{A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence}},
  author={Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Cabrera, Luisa Polania and Jampani, Varun and Sun, Deqing and Yang, Ming-Hsuan},
  journal={arXiv preprint arxiv:2305.15347},
  year={2023}
}

Acknowledgement

Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), DenseMatching, and ncnet. Our heartfelt gratitude goes to the developers of these resources!

yousafe007/sd-dino