This was a project submitted to the University of Queensland for the course COMP3710.
Simple diffusion based image generation using PyTorch. This model can learn from a dataset of images and generate new images that are perceptually similar to those in the dataset.
Huge thanks to these videos for helping my understanding:
- Diffusion models from scratch in PyTorch
- This repo was built largely referencing code in this colab notebook from the video. Quite a few changes were made to improve performance.
- Diffusion Models | Paper Explanation | Math Explained
Diffusion papers:
train.py
- Command line utility that trains a new diffusion model on a dataset.dataset.py
- Wraps a directory of image files in a PyTorch dataloader. Images can be any size or format that can be opened by PIL. All images are resized to a given dimension, converted to RGB and normalised to a range of -1 to 1.modules.py
- Contains a Trainer class to handle training of the model. Contains the U-Net model and required components.predict.py
- Command line utility to predict a new images from an existing.pth
model
- A system (preferably linux) with either Anaconda or Miniconda installed.
- A GPU with at least 12GB memory if you plan to train models
-
Clone this branch and cd to the
recognition/45802492_SimpleDiffusion/
folder -
Setup a new conda environment. An
environment.yml
file is supplied to do this automatically.conda env create -f environment.yml conda activate diff
-
Create a folder with training images in the local directory (eg.
PatternFlow/recognition/images
). There are no requirements on image size or naming. All images within the this folder will resized and used to train the model. -
Run the training script:
python train.py name path
which will start training. Every epoch a test image will be generated and saved to./out
and a denoising timestep plot will be save to./plot
. -
Tensorboard is also supported and training is saved to
./runs
. You can launch tensorboard using:tensorboard --logdir ./
to view loss metrics during training. -
Once training has finished, the model will be saved as
name.pth
in the local directory. Additionally every epoch anautosave.pth
file is also created.
Parameters for train.py
Parameter | Short | Default | Description | |
---|---|---|---|---|
name | required | Name of model | ||
path | required | Path to dataset folder | ||
--timesteps | -t | optional | 1000 | Number of diffusion timesteps in betas schedule |
--epochs | -e | optional | 100 | Number of epochs to train for |
--batch_size | -b | optional | 64 | Training batch size |
--image_size | -i | optional | 64 | Image dimension. All images are resized to size x size |
--beta_schedule | -s | optional | linear | Beta schedule type. Options: 'linear', 'cosine', 'quadratic'and 'sigmoid' |
--disable_images | optional | Disables saving images and plots every epoch | ||
--disable_tensorboard | optional | Disables tensorboard for training |
- Run the predict script
python predict.py model
- A random image will be generated using the supplied model and saved
Parameters for predict.py
Parameter | Short | Default | Description | |
---|---|---|---|---|
model | required | Path to .pth model file |
||
--output | -o | optional | ./ | Output path to save images |
--name | -n | optional | predict | Name prefix to use for generated images |
--num_images | -i | optional | 1 | Number of images to create |
Some pretrained models are supplied in the examples section below.
Diffusion image generation is described in these papers: 1, 2. They work by describing a markov chain in which gaussain noise is sucessively added to an image for a defined number of timesteps
This is called the forward diffusion process. The reverse diffusion process is the opposite in that given an image at a certain timestep
A U-Net neural network is then trained to predict the noise in an image for a given timestep. To do this, the timestep
Once the U-Net has been trained, denoising can be performed on a random point in latent space (usually an image consisting of pure gaussian noise) using the U-Net by repeatedly subtracting the predicted noise over the entire reverse timestep range. This results in a new image that is perceptually similar to those in the training dataset.
This project uses a simplified U-Net design omitting some of the features described in the papers above. The general architecutre is:
Using part of the AKOA Knee dataset consisting of 18,681 MRI images. Image size 128x128, batch size 64, 1000 Timesteps, 100 epochs. Download the pretrained model.
Epoch 0 Epoch 10 Epoch 20 Epoch 99
Using the OASIS Brain with 11,329 images. Image size 128x128, 1000 Timesteps, batch size 32, 100 epochs. Notice the artifacts due to the small batch size. Download the pre-trained model.
Epoch 0 Epoch 10 Epoch 20 Epoch 99
Just for fun, the model was also trained on the CelebA dataset (aligned and cropped) consisting of around 200,000 images. Image size 128x128, batch size 64, 1000 Timesteps, 100 epochs. Download the pre-trained model. The network does well with the faces but struggles in generating hair and backgrounds.