An implementation of Elucidating the Design Space of Diffusion-Based Generative Models (Karras et al., 2022) for PyTorch, with enhancements and additional features, such as improved sampling algorithms and transformer-based diffusion models.
k-diffusion
can be installed via PyPI (pip install k-diffusion
) but it will not include training and inference scripts, only library code that others can depend on. To run the training and inference scripts, clone this repository and run pip install -e <path to repository>
.
To train models:
$ ./train.py --config CONFIG_FILE --name RUN_NAME
For instance, to train a model on MNIST:
$ ./train.py --config configs/config_mnist_transformer.json --name RUN_NAME
The configuration file allows you to specify the dataset type. Currently supported types are "imagefolder"
(finds all images in that folder and its subfolders, recursively), "cifar10"
(CIFAR-10), and "mnist"
(MNIST). "huggingface"
Hugging Face Datasets is also supported.
Multi-GPU and multi-node training is supported with Hugging Face Accelerate. You can configure Accelerate by running:
$ accelerate config
then running:
$ accelerate launch train.py --config CONFIG_FILE --name RUN_NAME
-
k-diffusion has support for training transformer-based diffusion models (like DiT but improved).
-
k-diffusion supports a soft version of Min-SNR loss weighting for improved training at high resolutions with less hyperparameters than the loss weighting used in Karras et al. (2022).
-
k-diffusion has wrappers for v-diffusion-pytorch, OpenAI diffusion, and CompVis diffusion models allowing them to be used with its samplers and ODE/SDE.
-
k-diffusion implements DPM-Solver, which produces higher quality samples at the same number of function evalutions as Karras Algorithm 2, as well as supporting adaptive step size control. DPM-Solver++(2S) and (2M) are implemented now too for improved quality with low numbers of steps.
-
k-diffusion supports CLIP guided sampling from unconditional diffusion models (see
sample_clip_guided.py
). -
k-diffusion supports log likelihood calculation (not a variational lower bound) for native models and all wrapped models.
-
k-diffusion can calculate, during training, the FID and KID vs the training set.
-
k-diffusion can calculate, during training, the gradient noise scale (1 / SNR), from An Empirical Model of Large-Batch Training, https://arxiv.org/abs/1812.06162).
- Latent diffusion