This is the official PyTorch codes for the paper:
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao,
Kai Han,
Zhengyao Lv,
Shihao Zhao,
Kwan-Yee K. Wong
The University of Hong Kong
ECCV 2024 (Oral)
We present Unsupervised Concept Extraction (UCE) that focuses on the unsupervised problem of extracting multiple concepts from a single image.
The dataset of input images used in our paper is now available at this link. All images in this dataset are sourced from Unsplash under a license that allows free download and use!
Create a conda environment uce
using
conda env create -f environment.yml
conda activate uce
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
Create a new folder that contains an img.jpg
. For example, download our dataset and put it under the root path. You can change --instance_data_dir
in bash file scripts/train.sh
to uce_images/XX
or any other image path you like. You can specify --output_dir
to save the checkpoints.
When the above is ready, run the following to start training:
bash scripts/train.sh
The learned token embeddings of all concepts are saved to .bin
files under your --output_dir
.
Once trained, the i-th concept is represented as <asset$i>
in the tokenizer. We can then freely generate images using any concept token <asset$i>
(replace $i
with a valid concept index):
python infer.py \
--embed_path $CKPT_BIN_FILE \
--prompt "a photo of <asset$i> in the snow" \
--save_path $SAVE_FOLDER \
--seed 0
Please specify $CKPT_BIN_FILE
which is the .bin
file path of your learned token embeddings, and $SAVE_FOLDER
to save the generated images. You can also find inference examples in scripts/infer.sh
.
If you use this code in your research, please consider citing our paper:
@InProceedings{hao2024conceptexpress,
title={Concept{E}xpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction},
author={Shaozhe Hao and Kai Han and Zhengyao Lv and Shihao Zhao and Kwan-Yee~K. Wong},
booktitle={ECCV},
year={2024},
}
This code repository is based on the great work of Break-A-Scene. Thanks!