/clear-diffusion-keras

Implementation of denoising diffusion models with schedules, improved sampling, and other extensions using Keras.

Primary LanguagePythonMIT LicenseMIT

Modular and Readable Denoising Diffusion Models in Keras

flowers stochastic generation

Diffusion models are trained to denoise noisy images, and can generate images by iteratively denoising pure noise.

spaces badge Open In Colab

This repository contains:

The network was optimized to offer reasonable performance with modest compute requirements (training time is below an hour on an A100). Other design choices are explained in detail in the corresponding Keras code example.

Sampling techniques

KID at different sampling steps with different sampling techniques, using cosine schedule. Note that I selected the sampling hyperparameters using DDIM sampling and 5 diffusion steps.

sampling techniques

For first order methods network evaluations = diffusion steps, and for second order methods network evaluations = 2 * diffusion steps.

Diffusion schedules

diffusion schedules

For this plot I used 100 diffusion steps, and a start_log_snr and end_log_snr of 5.0 and -5.0 for symmetry, while their defaults are 2.5 and -7.5.

For implementation details, check out diffusion_schedule() in model.py.

Generation quality

Kernel Inception Distance (KID):

Dataset / Loss mean absolute error (MAE) mean squared error (MSE)
Oxford Flowers 0.282 0.399
CelebA 0.148 0.104
Caltech Birds 1.382 1.697
CIFAR-10 0.217 0.175
Network output / Loss weighting noise velocity signal
noise 0.282 0.327 0.348
velocity 0.299 0.290 0.333
signal 0.291 0.319 0.329

Trained with default hyperparameters if not mentioned otherwise, tuned on Oxford Flowers.

  • KID is a generative performance metric with a simple unbiased estimator, that is more suitable for limited amounts of images, and is also computationally cheaper to measure compared to the Frechet Inception Distance (FID).
  • The Inceptionv3 network's pretrained weights are loaded from Keras applications.
  • For computational efficiency, the images are evaluated at the minimal possible resolution (75x75 instead of 299x299), therefore the exact values might not be comparable with other implementations.
  • For computational efficiency, it is measured only on the validation splits of the datasets.
  • For computational efficiency, it is measured on images generated with only 5 diffusion steps.

Visualizations

All visualizations below were generated using:

  • 200 diffusion steps
  • DDPM sampling with large variance (stochasticity = 1.0, variance_preserving = False)
  • all other parameters left on default

Oxford Flowers 102

  • 6500 training images (80% of every split)
  • 64x64 resolution, center cropped

flowers generated images

CelebFaces Attributes (CelebA)

  • 160.000 training images
  • 64x64 resolution, center cropped

celeba generated images

Caltech Birds 2011 (CUB-200)

  • 6000 training images
  • 64x64 resolution, cropped on bounding boxes

birds generated images

CIFAR-10

  • 50.000 training images
  • 32x32 resolution

cifar10 generated images

For a similar implementation of GANs and GAN losses, check out this repository.