A curated list of methods that focus on improving the efficiency of diffusion models
I‘m trying to update this list weekly (every monday morning) from my personal knowledge stack, and collect each conference's proceedings. If you find this repo useful, it would be kind to consider ★staring it or ☛contributing to it.
- [2024/07/08] Reorganizing the catalogs
- [2024/07/09] (ING) Filling in existing surveys
Recommended introductory learning materials
- David Saxton's Tutorial on Diffusion
- Song Yang's Post: Generative Modeling by Estimating Gradients of the Data Distribution
- EfficientML Course, MIT, Han Song, The Diffusion Chapter
formulations of diffusion, development of theory
-
[DPM] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- Early advance of diffusion formulation
- 2015/03 | ICML15 | [Paper]
-
[DDPM] "Denoising Diffusion Probabilistic Models";
- 2020/06 | NeurIPS20 | [Paper]
- The discrete time diffusion
-
[SDE-based Diffusion]
- 2020/11 | ICLR21 | [Paper]
- Continuous time Neural SDE formulation of diffusion
how to introduce control signal
-
[Classifier-based Guidance] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- 2021/05 | Arxiv2105 | [Paper]
- Introduce control signal through classifier
-
[Classifier-free Guidance (CFG)] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- 2022/07 | NeurIPS 2021 Workshop | [Paper]
- Introduce CFG, jointly train a conditional and an unconditional diffusion model, and combine them
-
[LDM] "High-Resolution Image Synthesis with Latent Diffusion Models";
-
[DDIM]: "Denoising Diffusion Implicit Models";
- 2020/10 | ICLR21 | [Paper]
- determinstic sampling, skip timesteps
-
[DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 2022/06 | NeurIPS22 | [Paper]
- utilize the sub-linear property of ODE solving, converge in 10-20 steps
-
[DPMSolver++]: "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models";
- 2022/11 | Arxiv | [Paper]
- multi-order ODE, faster convergence
Text_encoder
-
[CLIP] "Learning Transferable Visual Models From Natural Language Supervision";
- 2021/03 | Arxiv | [Paper]
- Containing Operations:
- Self-Attention (Cross-Attention)
- FFN (FC)
- LayerNorm (GroupNorm)]
-
[T5] "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer";
- 2019/10 | Arxiv | [Paper]
- Containing Operations:
- Self-Attention (Cross-Attention)
- FFN (FC)
- LayerNorm (GroupNorm)
Summarization of adopted text encoders for large text-to-image models from Kling-AI Technical Report
VAE (for latent-space)
- [VAE] "Tutorial on Variational Autoencoders";
- 2016/06 | Arxiv | [Paper]
- Containing Operations:
- Conv
- DeConv (ConvTransposed, Interpolation)
Diffusion Network
-
[U-Net] "U-Net: Convolutional Networks for Biomedical Image Segmentation";
- 2015/05 | Arxiv | [Paper]
- Containing Operations:
- Conv
- DeConv (ConvTransposed, Interpolation)
- Low-range Shortcut Connection
-
DiT
UpScaler
-
[Imagen]: "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”;
- 2022/05 | NeurIPS22 | [Paper]
-
[DeepFlyoid-IF] "DeepFlyod-IF";
- 2022/04 | Arxiv | Stability.AI | [Technical Report] | [Code]
- Larger Language Model (T5 over CLIP) | Pixel-space Diffusion | Diffusion for SR
-
CIFAR-10:
-
CelebA:
-
- Fréchet Inception Distance: evaluting 2 set of image, intermediate feature distance of InceptionNet between reference image and generated image, lower the better
- Kernel Inception Distance
- Inception Score
- limitation: when model trained under large image-caption dataset (LAION-5B), for that the Inception is pre-trained on ImageNet-1K. (StableDiffusion pre-trained set may have overlap)
- The specific Inception model used during computation.
- The image format (not the same if we start from PNGs vs JPGs)
-
- CLIP score: compatibility of image-text pair
- CLIP directional similarity: compatibility of image-text pair
- limitation: The captions tags were crawled from the web, may not align with human description.
-
Other Metrics (Refering from Schuture/Benchmarking-Awesome-Diffusion-Models)
reduce the timestep (the number of u-net inference)
-
[DDIM]: "Denoising Diffusion Implicit Models";
- 2021/10 | ICLR21 | [Paper]
- 📊 Key results: 50
100 Steps -> 1020 Steps with moderate performance loss
-
[DPM-Solver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 2022/06 | NeurIPS | [Paper]
- 📊 Key results: NFE (number of U-Net forward) = 10 achieves similar performance with DDIM NFE = 100
-
[Catch-Up Distillation]: "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling";
- 2023/05 | Arxiv2305 | [Paper]
-
[ReDi]: "ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval";
- Skip intermediate steps:
- Retrieval: find similar partially generated scheduling in early stage
- 2023/02 | ICML23 | [Paper]
-
[Consistency Model]: "Consistency Models";
- New objective: consistency based
- 2023/03 | Arxiv2303 | [Paper]]
reduce the diffusion model cost (the repeatedly inference u-net) with pruning / neural architecture search (nas) techniques
- [Structural Pruning]: "Structural Pruning for Diffusion Models";
adaptive skip part of the architecture across timesteps
save computation for different sample condition (noise/prompt/task)
reduce the processing resolution
-
[PatchDiffusion]: "Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models";
- 2023/04 | NeurIPS23 | [Paper]
-
[MemEffPatchGen]: "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation";
- 2023/04 | CVPR23W | [Paper]
quantization & low-bit inference/training
-
[PTQD]: "PTQD: Accurate Post-Training Quantization for Diffusion Models";
- 2023/05 | NeurIPS23 | [Paper]
-
[BiDiffusion]: "Binary Latent Diffusion";
- 2023/04 | Arxiv2304 | [Paper]
-
[DiffFit]: "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning";
- 2023/04 | Arxiv2304 | [Paper]
-
[ParamEffTuningSummary]: "A Closer Look at Parameter-Efficient Tuning in Diffusion Models";
- 2023/03 | Arxiv2303 | [Paper]
The LORA family
- [SnapFusion]: "SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds";
- Platform: iPhone 14 Pro, 1.84s
- Model Evolution: 3.8x fewer parameters compared to SD-V1.5
- Step Distillation into 8 steps
- 2023/06 | Arxiv2306 | [Paper]
- heejkoo/Awesome-Diffusion-Models
- awesome-stable-diffusion/awesome-stable-diffusion
- hua1995116/awesome-ai-painting
- PRIV-Creation/Awesome-Diffusion-Personalization
- Schuture/Benchmarking-Awesome-Diffusion-Models
- shogi880/awesome-controllable-stable-diffusion
- Efficient Diffusion Models for Vision: A Survey
- Tracking Papers on Diffusion Models
This list is under the Creative Commons licenses License.