DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

This repo contains the official implementation of the paper: DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents by Kushagra Pandey, Avideep Mukherjee, Piyush Rai, Abhishek Kumar

Overview

DiffuseVAE is a novel generative framework that integrates a standard VAE within a diffusion model by conditioning the diffusion model samples on the VAE generated reconstructions. The resulting model can significantly improve upon the blurry samples generated from a standard VAE while at the same time equipping diffusion models with the low-dimensional VAE inferred latent code which can be used for downstream tasks like controllable synthesis and image attribute manipulation. In short, DiffuseVAE presents a generative model which combines the benefits of both VAEs and Diffusion models.

Our core contributions are as follows:

We propose a generic DiffuseVAE conditioning framework and show that our framework can be reduced to a simple generator-refiner framework in which blurry samples generated from a VAE are refined using a conditional DDPM formulation.
Controllable synthesis from a low-dimensional latent using diffusion models.
Better speed vs quality tradeoffs: We show that DiffuseVAE inherently provides a better speed vs quality tradeoff as compared to standard DDPM/DDIM models on several image benchmarks
State-of-the-art synthesis: We show that DiffuseVAE exhibits synthesis quality comparable to recent state-of-the-art on standard image synthesis benchmarks like CIFAR-10, CelebA-64 and CelebA-HQ while maintaining access to a low-dimensional latent code representation.
Generalization to noisy conditioning signals: We show that a pre-trained DiffuseVAE model exhibits generalization to different noise types in the DDPM conditioning signal exhibiting the effectiveness of our conditioning framework.

Code overview

This repo uses PyTorch Lightning for training and Hydra for config management so basic familiarity with both these tools is expected. Please clone the repo with DiffuseVAE as the working directory for any downstream tasks like setting up the dependencies, training and inference.

Setting up the dependencies

We use pipenv for a project-level dependency management. Simply install pipenv and run the following command:

pipenv install

Config Management

We manage train and test configurations separately for each benchmark/dataset used in this work. All configs are present in the main/configs directory. This directory has subfolders named according to the dataset. Each dataset subfolder contains the training and evaluation configs as train.yaml and test.yaml.

Note: The configuration files consists of many command line options. The meaning of each of these options is explained in the config for CIFAR-10.

Training

Please refer to the scripts provided in the table corresponding to some training tasks possible using the code.

Task	Reference
Training First stage VAE	`scripts/train_ae.sh`
Training Second stage DDPM	`scripts/train_ddpm.sh`

Inference

Please refer to the scripts provided in the table corresponding to some inference tasks possible using the code.

Task	Reference
Sample/Reconstruct from Baseline VAE	`scripts/test_ae.sh`
Sample from DiffuseVAE	`scripts/test_ddpm.sh`
Generate reconstructions from DiffuseVAE	`scripts/test_recons_ddpm.sh`
Interpolate in the VAE/DDPM latent space using DiffuseVAE	`scripts/interpolate.sh`

For computing the evaluation metrics (FID, IS etc.), we use the torch-fidelity package. See scripts/fid.sh for some sample usage examples.

Pretrained checkpoints

All pretrained checkpoints have been organized by dataset and can be accessed here.

Citing

To cite DiffuseVAE please use the following BibTEX entries:

@misc{pandey2022diffusevae,
      title={DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents}, 
      author={Kushagra Pandey and Avideep Mukherjee and Piyush Rai and Abhishek Kumar},
      year={2022},
      eprint={2201.00308},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@inproceedings{
pandey2021vaes,
title={{VAE}s meet Diffusion Models: Efficient and High-Fidelity Generation},
author={Kushagra Pandey and Avideep Mukherjee and Piyush Rai and Abhishek Kumar},
booktitle={NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications},
year={2021},
url={https://openreview.net/forum?id=-J8dM4ed_92}
}

Since our model uses diffusion models please consider citing the original Diffusion model, DDPM and VAE papers.

Contact

Kushagra Pandey (@kpandey008)

kpandey008/DiffuseVAE