TweedieMix

Official source codes for "TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation"

Environments

$ conda create -n tweediemix python=3.11.8
$ conda activate tweediemix
$ pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
$ pip install -r requirements.txt
$ git clone https://github.com/KwonGihyun/TweedieMix
$ cd TweedieMix/text_segment
$ git clone https://github.com/IDEA-Research/GroundingDINO.git
$ cd GroundingDINO/
$ pip install -e .

download and install xformers which is compatible for your own environment in https://download.pytorch.org/whl/xformers/

e.g. pip install xformers-0.0.26+cu118-cp311-cp311-manylinux2014_x86_64.whl

If you have problem in installing GroundingDINO, please refer to original repository

Preliminary : Single-concept Training

Train single concept aware model using Custom Diffusion framework

bash singleconcept_train.sh

Most of the concept datasets can be downloaded from customconcept101 dataset

We provide both of Custom Diffusion (key,query weight finetuning) and Low-Rank Adaptation

We also provide several pre-trained weights in LINK

Multi Concept Generation

Sampling multi-concept aware images with personalized checkpoints

we provide script for sampling multi-concepts for example,

bash sample_catdog.sh

we also provide several scripts for multi-concept generation for both of custom diffusion weight or LoRa weights.

For different generation setting, adjust parameters in the bash file.

Video Generation

After generating multi-concept image, generate video output using Image-to-Video Diffusion

we provide script for I2V generation

python run_video.py

For different generation setting, adjust parameters in the script file.

📋 Results

Multi-Concept Image Generation Results

Multi-Concept Video Generation Results

Video Customization Comparison

References

If you find this paper useful for your research, please consider citing

@InProceedings{
  kwon2024tweedie,
  title={TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation},
  author={Kwon, Gihyun and Ye, Jong Chul},
  booktitle={https://arxiv.org/abs/2410.05591},
  year={2024}
}

Also please refer to our previous version Concept Weaver

@InProceedings{kwon2024concept,
    author    = {Kwon, Gihyun and Jenni, Simon and Li, Dingzeyu and Lee, Joon-Young and Ye, Jong Chul and Heilbron, Fabian Caba},
    title     = {Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {8880-8889}
}

Acknowledgement

Our source code is based on Plug-and-Play Diffusion , Custom Diffusion, LangSAM

KwonGihyun/TweedieMix