/SimpleDiffusion

Simple diffusion image generation library

Primary LanguagePython

Simple Diffusion Image Generation

This was a project submitted to the University of Queensland for the course COMP3710.

Simple diffusion based image generation using PyTorch. This model can learn from a dataset of images and generate new images that are perceptually similar to those in the dataset.

plot_epoch94

References

Huge thanks to these videos for helping my understanding:

Diffusion papers:

Contents

  • train.py - Command line utility that trains a new diffusion model on a dataset.
  • dataset.py - Wraps a directory of image files in a PyTorch dataloader. Images can be any size or format that can be opened by PIL. All images are resized to a given dimension, converted to RGB and normalised to a range of -1 to 1.
  • modules.py - Contains a Trainer class to handle training of the model. Contains the U-Net model and required components.
  • predict.py - Command line utility to predict a new images from an existing .pth model

Usage

Prerequisites

  • A system (preferably linux) with either Anaconda or Miniconda installed.
  • A GPU with at least 12GB memory if you plan to train models

Setup

  1. Clone this branch and cd to the recognition/45802492_SimpleDiffusion/ folder

  2. Setup a new conda environment. An environment.yml file is supplied to do this automatically.

    conda env create -f environment.yml
    conda activate diff
    
    

Train a model

  1. Create a folder with training images in the local directory (eg. PatternFlow/recognition/images). There are no requirements on image size or naming. All images within the this folder will resized and used to train the model.

  2. Run the training script: python train.py name path which will start training. Every epoch a test image will be generated and saved to ./out and a denoising timestep plot will be save to ./plot.

  3. Tensorboard is also supported and training is saved to ./runs. You can launch tensorboard using: tensorboard --logdir ./ to view loss metrics during training.

  4. Once training has finished, the model will be saved as name.pth in the local directory. Additionally every epoch an autosave.pth file is also created.

Parameters for train.py

Parameter Short Default Description
name required Name of model
path required Path to dataset folder
--timesteps -t optional 1000 Number of diffusion timesteps in betas schedule
--epochs -e optional 100 Number of epochs to train for
--batch_size -b optional 64 Training batch size
--image_size -i optional 64 Image dimension. All images are resized to size x size
--beta_schedule -s optional linear Beta schedule type. Options: 'linear', 'cosine', 'quadratic'and 'sigmoid'
--disable_images optional Disables saving images and plots every epoch
--disable_tensorboard optional Disables tensorboard for training

Using an existing model

  1. Run the predict script python predict.py model
  2. A random image will be generated using the supplied model and saved

Parameters for predict.py

Parameter Short Default Description
model required Path to .pth model file
--output -o optional ./ Output path to save images
--name -n optional predict Name prefix to use for generated images
--num_images -i optional 1 Number of images to create

Some pretrained models are supplied in the examples section below.

Algorithm Description

process

Diffusion image generation is described in these papers: 1, 2. They work by describing a markov chain in which gaussain noise is sucessively added to an image for a defined number of timesteps $T$ using a variance schedule $\beta_1,...,\beta_T$.

Equation

This is called the forward diffusion process. The reverse diffusion process is the opposite in that given an image at a certain timestep $\mathbf{x}_t$, the denoised image is given by:

Equation

A U-Net neural network is then trained to predict the noise in an image for a given timestep. To do this, the timestep $t$ is positionally encoded using sinusoidal embeddings between the convolutional layers in the U-Net blocks. Training is performed by passing in large numbers of images from a dataset with noise added using the forward diffusion process. The U-Net is passed the noisy image and timestep as the input and the isolated noise as the target.

Once the U-Net has been trained, denoising can be performed on a random point in latent space (usually an image consisting of pure gaussian noise) using the U-Net by repeatedly subtracting the predicted noise over the entire reverse timestep range. This results in a new image that is perceptually similar to those in the training dataset.

This project uses a simplified U-Net design omitting some of the features described in the papers above. The general architecutre is: unet

Examples

AKOA Knee

Using part of the AKOA Knee dataset consisting of 18,681 MRI images. Image size 128x128, batch size 64, 1000 Timesteps, 100 epochs. Download the pretrained model.

Training

Epoch 0 plot_epoch0 Epoch 10 plot_epoch10 Epoch 20 plot_epoch20 Epoch 99 plot_epoch99

Some Examples After Training

predict0 predict1 predict2 predict3 predict4 predict5

OASIS Brain

Using the OASIS Brain with 11,329 images. Image size 128x128, 1000 Timesteps, batch size 32, 100 epochs. Notice the artifacts due to the small batch size. Download the pre-trained model.

Training

Epoch 0 plot_epoch0 Epoch 10 plot_epoch10 Epoch 20 plot_epoch20 Epoch 99 plot_epoch99

Examples after training

predict2 predict3 predict4 predict6 predict7 predict8 predict9

CelebA Dataset

Just for fun, the model was also trained on the CelebA dataset (aligned and cropped) consisting of around 200,000 images. Image size 128x128, batch size 64, 1000 Timesteps, 100 epochs. Download the pre-trained model. The network does well with the faces but struggles in generating hair and backgrounds.

Training

Epoch 0 plot_epoch0 Epoch 10 plot_epoch10 Epoch 20 plot_epoch20 Epoch 99 plot_epoch99

Examples after training

predict5 predict6 predict7 predict9 predict11 predict13 predict12