/diffusion-for-beginners

denoising diffusion models, as simple as possible

Primary LanguagePythonMIT LicenseMIT

diffusion for beginners

  • implementation of diffusion schedulers with minimal code & as faithful to the original work as i could. most recent work reuse or adopt code from previous work and build on it, or transcribe code from another framework - which is great! but i found it hard to follow at times. this is an attempt at simplifying below great papers. the trade-off is made between stability and correctness vs. brevity and simplicity.

$$\large{\mathbf{{\color{green}feel\ free\ to\ contribute\ to\ the\ list\ below!}}}$$

prompt: "a man eating an apple sitting on a bench"

dpm-solver++ exponential integrator
heun dpm-solver
ddim pndm
ddpm improved ddpm

* requirements *

while this repository is intended to be educational, if you wish to run and experiment, you'll need to obtain a token from huggingface (and paste it to generate_sample.py), and install their excellent diffusers library

* modification for heun sampler *

heun sampler uses two neural function evaluations per step, and modifies the input as well as the sigma. i wanted to be as faithful to the paper as much as possible, which necessitated changing the sampling code a little. initiate the sampler as:

sampler = HeunSampler(num_sample_steps=25, denoiser=pipe.unet, alpha_bar=pipe.scheduler.alphas_cumprod)
init_latents = torch.randn(batch_size, 4, 64, 64).to(device) * sampler.t0

and replace the inner loop for generate_sample.py as:

for t in tqdm(sampler.timesteps):
    latents = sampler(latents, t, text_embeddings, guidance_scale)

similarly, for dpm-solver-2,

    sampler = DPMSampler(num_sample_steps=20, denoiser=pipe.unet)
    init_latents = torch.randn(batch_size, 4, 64, 64).to(device) * sampler.lmbd(1)[1]

and, for fast exponential integrator,

    sampler = ExponentialSampler(num_sample_steps=50, denoiser=pipe.unet)
    init_latents = torch.randn(batch_size, 4, 64, 64).to(device)

and, for dpm-solver++ (2m),

    sampler = DPMPlusPlusSampler(denoiser=pipe.unet, num_sample_steps=20)
    init_latents = torch.randn(batch_size, 4, 64, 64).to(device) * sampler.get_coeffs(sampler.t[0])[1]

soft-diffusion

a sketch/draft of google's new paper, soft diffusion: score matching for general corruptions, which achieves state-of-the-art results on celeba-64 dataset.

details can be found here