ConceptExpress

This is the official PyTorch codes for the paper:

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong
The University of Hong Kong
ECCV 2024 (Oral)

We present Unsupervised Concept Extraction (UCE) that focuses on the unsupervised problem of extracting multiple concepts from a single image.

Project Page

The dataset of input images used in our paper is now available at this link. All images in this dataset are sourced from Unsplash under a license that allows free download and use!

Set-up

Create a conda environment uce using

conda env create -f environment.yml
conda activate uce
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Training

Create a new folder that contains an img.jpg. For example, download our dataset and put it under the root path. You can change --instance_data_dir in bash file scripts/train.sh to uce_images/XX or any other image path you like. You can specify --output_dir to save the checkpoints.

When the above is ready, run the following to start training:

bash scripts/train.sh

The learned token embeddings of all concepts are saved to .bin files under your --output_dir.

Inference

Once trained, the i-th concept is represented as <asset$i> in the tokenizer. We can then freely generate images using any concept token <asset$i> (replace $i with a valid concept index):

python infer.py \
  --embed_path $CKPT_BIN_FILE \
  --prompt "a photo of <asset$i> in the snow" \
  --save_path $SAVE_FOLDER \
  --seed 0

Please specify $CKPT_BIN_FILE which is the .bin file path of your learned token embeddings, and $SAVE_FOLDER to save the generated images. You can also find inference examples in scripts/infer.sh.

Citation

If you use this code in your research, please consider citing our paper:

@InProceedings{hao2024conceptexpress,
    title={Concept{E}xpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction}, 
    author={Shaozhe Hao and Kai Han and Zhengyao Lv and Shihao Zhao and Kwan-Yee~K. Wong},
    booktitle={ECCV},
    year={2024},
}

Acknowledgements

This code repository is based on the great work of Break-A-Scene. Thanks!