/How-Diffusion-Models-Work

Notes from How Diffusion Models Work by DeepLearning.ai

Primary LanguageJupyter Notebook

How-Diffusion-Models-Work

Notes from How Diffusion Models Work by DeepLearning.ai

Contents

Intuition

Sampling

  • With Extra Noise
explorer_pC0437cXSo.mp4

Training

Context Embedding

Faster Sampling


Notes

Taught By Sharon Zhou

Noted by Atul

image

  • Example used throughout the course: Generate 16X16 size sprites for video games.

Intuition

  • Goal : Given a lot of sprite images, generate even more sprite images

image

  • What does the network learn?

    • Fine details
    • General outline
    • Everything in between
  • Noising Process (bob as ink drop analogy)

image
  • Denoising Process (what should the NN think?)

    • If its' Bob the sprite, keep it as it is
    • If its likely to be Bob, suggest more details to be filled
    • If its just an outline of a sprite, suggest general details for likely sprite(bob/fred/...)
    • If its nothing, suggest outline of a sprite
  • Give the NN input noise, whose pixels are obtained from Normal distribution, and get a completely new sprite !

Sampling

  • Assume you have a trained NN
  • At each denoising step, it predicts noise, and subtracts it to get a better image
  • NOTE: At each denoising step, some random noise is added again to prevent "mode collapse"

Neural Network

  • UNet Architecture
    • Input and output of same size
    • First used for image segmentation

image

  • Takes a noisy image, embeds into small space by downsampling, and upsamples to predict noise

  • Can take more info. in form of embeddings

    • Time: related to timestep, and noise level added
    • Context: guides generation process
  • Checkout forward() in sampling notebook

image

Training

Learns the distribution of what is "not noise"

  • Sample training image, timestep t, and noise, randomly
    • Timestep helps control level of noise
    • randomisation ensures a stable model
  • Add noise to image
  • Input this into NN, which predicts the noise
  • Compute loss between actual and predicted noise
  • Backprop and learn

image

Control

  • Embeddings are vectors , for instance, strings represented as number vectors
  • Given as input to NN along with training image
  • Get associated with a training example, and its properties
  • Uses: Generate funky mixtures by combining embeddings
  • Context formats
    • Text
    • Categories, one hot encoded (Eg. hero, non-hero, spells ...)

image

Fast Sampling : DDIM

  • DDPM is slow!
    • Multiple timesteps, and markovian nature
  • Skips steps, making the process deterministic
  • Lower quality than DDPM

Summary

Other applications : Music, Inpainting, Textual Inversion