Awesome Efficient Diffusion

A curated list of methods that focus on improving the efficiency of diffusion models

Updates

I‘m trying to update this list weekly (every monday morning) from my personal knowledge stack, and collect each conference's proceedings. If you find this repo useful, it would be kind to consider ★staring it or ☛contributing to it.

[2024/07/08] Reorganizing the catalogs
[2024/07/09] (ING) Filling in existing surveys

Basics

Resources

Recommended introductory learning materials

Applications: Huggingface Diffuser Doc

Diffusion Formulation

formulations of diffusion, development of theory

[DPM] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- Early advance of diffusion formulation
- 2015/03 | ICML15 | [Paper]
[DDPM] "Denoising Diffusion Probabilistic Models";
- 2020/06 | NeurIPS20 | [Paper]
- The discrete time diffusion
[SDE-based Diffusion]
- 2020/11 | ICLR21 | [Paper]
- Continuous time Neural SDE formulation of diffusion

how to introduce control signal

[Classifier-based Guidance] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- 2021/05 | Arxiv2105 | [Paper]
- Introduce control signal through classifier
[Classifier-free Guidance (CFG)] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- 2022/07 | NeurIPS 2021 Workshop | [Paper]
- Introduce CFG, jointly train a conditional and an unconditional diffusion model, and combine them
[LDM] "High-Resolution Image Synthesis with Latent Diffusion Models";
- 2021/12 | CVPR22 | [Paper] | [Code]
- Text-to-image conditioning with cross attention
- Latent space diffusion model

Solvers

[DDIM]: "Denoising Diffusion Implicit Models";
- 2020/10 | ICLR21 | [Paper]
- determinstic sampling, skip timesteps
[DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 2022/06 | NeurIPS22 | [Paper]
- utilize the sub-linear property of ODE solving, converge in 10-20 steps
[DPMSolver++]: "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models";
- 2022/11 | Arxiv | [Paper]
- multi-order ODE, faster convergence

Models

Key Components

Text_encoder

[CLIP] "Learning Transferable Visual Models From Natural Language Supervision";
- 2021/03 | Arxiv | [Paper]
- Containing Operations:
  - Self-Attention (Cross-Attention)
  - FFN (FC)
  - LayerNorm (GroupNorm)]
[T5] "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer";
- 2019/10 | Arxiv | [Paper]
- Containing Operations:
  - Self-Attention (Cross-Attention)
  - FFN (FC)
  - LayerNorm (GroupNorm)

Summarization of adopted text encoders for large text-to-image models from Kling-AI Technical Report

VAE (for latent-space)

[VAE] "Tutorial on Variational Autoencoders";
- 2016/06 | Arxiv | [Paper]
- Containing Operations:
  - Conv
  - DeConv (ConvTransposed, Interpolation)

Diffusion Network

[U-Net] "U-Net: Convolutional Networks for Biomedical Image Segmentation";
- 2015/05 | Arxiv | [Paper]
- Containing Operations:
  - Conv
  - DeConv (ConvTransposed, Interpolation)
  - Low-range Shortcut Connection
DiT

UpScaler

Open-sourced Models

[Imagen]: "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”;
- 2022/05 | NeurIPS22 | [Paper]
[DeepFlyoid-IF] "DeepFlyod-IF";
- 2022/04 | Arxiv | Stability.AI | [Technical Report] | [Code]
- Larger Language Model (T5 over CLIP) | Pixel-space Diffusion | Diffusion for SR

Closed-source Models

Datasets

Unconditional

Class-Conditioned

CIFAR-10:
CelebA:

Text-to-image

Evaluation Metrics

InceptionDistance
- Fréchet Inception Distance: evaluting 2 set of image, intermediate feature distance of InceptionNet between reference image and generated image, lower the better
- Kernel Inception Distance
- Inception Score
- limitation: when model trained under large image-caption dataset (LAION-5B), for that the Inception is pre-trained on ImageNet-1K. (StableDiffusion pre-trained set may have overlap)
  - The specific Inception model used during computation.
  - The image format (not the same if we start from PNGs vs JPGs)
Clip-related
- CLIP score: compatibility of image-text pair
- CLIP directional similarity: compatibility of image-text pair
- limitation: The captions tags were crawled from the web, may not align with human description.
Other Metrics (Refering from Schuture/Benchmarking-Awesome-Diffusion-Models)

Miscellaneous

Video Generation

Customized Generation

Generate Complex Scene

Algorithm-level

Timestep Reduction

reduce the timestep (the number of u-net inference)

Efficient Solver

[DDIM]: "Denoising Diffusion Implicit Models";
- 2021/10 | ICLR21 | [Paper]
- 📊 Key results: 50~~100 Steps -> 10~~20 Steps with moderate performance loss
[DPM-Solver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 2022/06 | NeurIPS | [Paper]
- 📊 Key results: NFE (number of U-Net forward) = 10 achieves similar performance with DDIM NFE = 100

Timestep Distillation

[Catch-Up Distillation]: "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling";
- 2023/05 | Arxiv2305 | [Paper]
[ReDi]: "ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval";
- Skip intermediate steps:
- Retrieval: find similar partially generated scheduling in early stage
- 2023/02 | ICML23 | [Paper]
[Consistency Model]: "Consistency Models";
- New objective: consistency based
- 2023/03 | Arxiv2303 | [Paper]]

Architecture-level Compression

reduce the diffusion model cost (the repeatedly inference u-net) with pruning / neural architecture search (nas) techniques

Pruning

[Structural Pruning]: "Structural Pruning for Diffusion Models";
- 2023/05 | Arxiv2305 | [Paper] [Code]

Adaptive Architecture

adaptive skip part of the architecture across timesteps

Token-level Compression

Token Reduction

save computation for different sample condition (noise/prompt/task)

[ToMe]: "Token Merging for Fast Stable Diffusion";
- 2023/03 | Arxiv2304 | [Paper] [Code]

Patched Inference

reduce the processing resolution

[PatchDiffusion]: "Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models";
- 2023/04 | NeurIPS23 | [Paper]
[MemEffPatchGen]: "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation";
- 2023/04 | CVPR23W | [Paper]

Model Quantization

quantization & low-bit inference/training

[PTQD]: "PTQD: Accurate Post-Training Quantization for Diffusion Models";
- 2023/05 | NeurIPS23 | [Paper]
[BiDiffusion]: "Binary Latent Diffusion";
- 2023/04 | Arxiv2304 | [Paper]

Efficient Tuning

[DiffFit]: "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning";
- 2023/04 | Arxiv2304 | [Paper]
[ParamEffTuningSummary]: "A Closer Look at Parameter-Efficient Tuning in Diffusion Models";
- 2023/03 | Arxiv2303 | [Paper]

5.1. Low-Rank

The LORA family

System-level

GPU

Mobile

[SnapFusion]: "SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds";
- Platform: iPhone 14 Pro, 1.84s
- Model Evolution: 3.8x fewer parameters compared to SD-V1.5
- Step Distillation into 8 steps
- 2023/06 | Arxiv2306 | [Paper]

Related Resources

License

This list is under the Creative Commons licenses License.

A-suozhang/Awesome-Efficient-Diffusion

Awesome Efficient Diffusion

Updates

Catalogs

Basics

Resources

Diffusion Formulation

Solvers

Models

Key Components

Open-sourced Models

Closed-source Models

Datasets

Unconditional

Class-Conditioned

Text-to-image

Evaluation Metrics

Miscellaneous

Video Generation

Customized Generation

Generate Complex Scene

Algorithm-level

Timestep Reduction

Efficient Solver

Timestep Distillation

Architecture-level Compression

Pruning

Adaptive Architecture

Token-level Compression

Token Reduction

Patched Inference

Model Quantization

Efficient Tuning

5.1. Low-Rank

System-level

GPU

Mobile

Related Resources

License