/Dimba

Transformer-Mamba Diffusion Models

Primary LanguagePython

🚀 Dimba: Transformer-Mamba Diffusion Models


This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper Transformer-Mamba Diffusion Models. You can find more visualizations on our project page.

TL; DR: Dimba is a new text-to-image diffusion model that employs a hybrid architecture combining Transformer and Mamba elements, thus capitalizing on the advantages of both architectural paradigms.


some generated cases.


1. Environments

  • Python 3.10

    • conda create -n your_env_name python=3.10
  • Requirements file

    • pip install -r requirements.txt
  • Install causal_conv1d and mamba

    • pip install -e causal_conv1d
    • pip install -e mamba

2. Download Models

Models reported in paper can be directly dounloaded as follows (Urgent upload in progress):

Model #Params url
t5 4.3B huggingface
vae 80M huggingface
Dimba-L-512 0.9B huggingface
Dimba-L-1024 0.9B -
Dimba-L-2048 0.9B -
Dimba-G-512 1.8B -
Dimba-G-1024 1.8B -

The datasets used to quality tuning for aesthetic performance enhancement can be download as:

Dataset Size url
Quality tuning 600k huggingface

3. Inference

We include a inference script which samples images from a Dimba model accroding to textual prompts. It supports DDIM and dpm-solver sampling algorithm. You can run the scripts as:

python scripts/inference.py \
--image_size 512 \
--model_version dimba-l \
--model_path /path/to/model \
--txt_file asset/examples.txt \
--save_path /path/to/save/results

4. Training

We provide a training script for Dimba in scripts/train.py. This script can be used to fine-tuning with different settings. You can run the scripts as:

python -m torch.distributed.launch --nnodes=4 --nproc_per_node=8 \
    --master_port=1234 scripts/train.py \
    configs/dimba_xl2_img512.py \
    --work-dir outputs

5. BibTeX

@misc{fei2024dimba,
    title={Dimba: Transformer-Mamba Diffusion Models}, 
    author={Zhengcong Fei and Mingyuan Fan and Changqian Yu and Debang Li and Youqiang Zhang and Junshi Huang},
    year={2024},
    eprint={2406.01159},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

6. Acknowledgments

The codebase is based on the awesome PixArt, Vim, and DiS repos.

The Dimba paper is polished with ChatGPT using prompt.