/DDM-Public

code for paper: Decoupled diffusion models: image to zero and zero to noise

Primary LanguagePython

DDM

Decoupled Diffusion Models: Simultaneous Image to Zero and Zero to Noise (arxiv paper)

Teaser

Framework

Framework

News

  • Update ddm_const_2, replacing the noise scheduler \sqrt(t) with t.
  • 2024-02-27: This work inspired the paper for Multiple Object Tracking: DiffMOT, which was accepted by CVPR-2024.
  • 2023-12-09: This work inspired the paper for edge detection: DiffusionEdge, which was accepted by AAAI-2024.
  • We now update training for text-2-img, please refer to text-2-img.
  • We now modify the two-branch UNet, resulting a single-decoder UNet architecture.
    You can use the single-decoder UNet in uncond-unet-sd and cond-unet-sd.

I. Before Starting.

  1. install torch
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
  1. install other packages.
pip install -r requirement.txt
  1. prepare accelerate config.
accelerate config

II. Prepare Data.

The file structure should look like:
(a) unconditional cifar10:

cifar-10-python
|-- cifar-10-batches-py
|   |-- data_batch_1
|   |-- data_batch_2
|   |-- XXX

(b) unconditional Celeb-AHQ:

celebahq
|-- celeba_hq_256
|   |-- 00000.jpg
|   |-- 00001.jpg
|   |-- XXXXX.jpg

(c) conditional DIV2K:

DIV2K
|-- DIV2K_train_HR
|   |-- 0001.png
|   |-- 0002.png
|   |-- XXXX.png
|-- DIV2K_valid_HR
|   |-- 0801.png
|   |-- 0802.png
|   |-- XXXX.png

(d) conditional DUTS:

DUTS
|-- DUTS-TR
|   |-- DUTS-TR-Image
|   |   |-- XXX.jpg
|   |-- DUTS-TR-Mask
|   |   |-- XXX.png
|-- DUTS-TE
|   |-- DUTS-TE-Image
|   |   |-- XXX.jpg
|   |-- DUTS-TE-Mask
|   |   |-- XXX.png

III. Unconditional training on image space for Cifar10 dataset.

accelerate launch train_uncond_dpm.py --cfg ./configs/cifar10/ddm_uncond_const_uncond_unet.yaml

IV. Unconditional training on latent space for CelebAHQ256 dataset.

  1. training auto-encoder:
accelerate launch train_vae.py --cfg ./configs/celebahq/celeb_ae_kl_256x256_d4.yaml
  1. you should add the model weights in the first step to config file ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm.yaml (line 41), then train latent diffusion model:
accelerate launch train_uncond_ldm.py --cfg ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm.yaml

V. Conditional training on latent space for DIV2K dataset. (super-resolution task for example.)

  1. training auto-encoder:
accelerate launch train_vae.py --cfg ./configs/super-resolution/div2k_ae_kl_512x512_d4.yaml
  1. training latent diffusion model:
accelerate launch train_cond_ldm.py --cfg ./configs/super-resolution/div2k_cond_ddm_const_ldm.yaml

VI. Conditional training on image space. (saliency detection task for example.)

accelerate launch train_cond_dpm.py --cfg ./configs/saliency/DUTS_ddm_const_dpm_114.yaml

VII. Faster Sampling

change the sampling steps "sampling_timesteps" in the config file

  1. unconditional generation:
python sample_uncond.py --cfg ./configs/cifar10/ddm_uncond_const_uncond_unet.yaml
python sample_uncond.py --cfg ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm.yaml
  1. conditional generation (Latent space model):
  • Super-resolution:
python ./eval_downstream/eval_sr.py --cfg ./configs/super-resolution/div2k_sample.yaml
  • Inpainting:
python ./eval_downstream/sample_inpainting.py --cfg ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm_sample.yaml
  • Saliency:
python ./eval_downstream/eval_saliency.py --cfg ./configs/saliency/DUTS_sample_114.yaml

VIII. Training for Text-2-Iamge

  1. download laion data from laion.
  2. download metadata using img2dataset, please refer to here.
  3. install clip.
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
  1. The final data structure looks like:
|-- laion
|   |-- 00000.tar
|   |-- 00001.tar
|   |-- XXXXX.tar
  1. training with config file text-2-img.
accelerate launch train_cond_ldm.py --cfg ./configs/text2img/ddm_uncond_const.yaml

Note that the pretrained weight of the AutoEncoder is downloaded from here, and you should unzip the file.

Pretrained Weight

Task Weight Config
Uncond-Cifar10 url url
Uncond-Celeb url url

Contact

If you have some questions, please concat with huangai@nudt.edu.cn.

Thanks

Thanks to the public repos: DDPM and LDM for providing the base code.

Citation

@article{huang2023decoupled,
  title={Decoupled Diffusion Models: Simultaneous Image to Zero and Zero to Noise},
  author={Huang, Yuhang and Qin, Zheng and Liu, Xinwang and Xu, Kai},
  journal={arXiv preprint arXiv:2306.13720},
  year={2023}
}