The repository is based on our survey Diffusion Model-Based Image Editing: A Survey.

Yi Huang*, Jiancheng Huang*, Yifan Liu*, Mingfu Yan*, Jiaxi Lv*, Jianzhuang Liu*, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen

Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Adobe Inc, Apple Inc, Southern University of Science and Technology (SUSTech)

Abstract

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the ﬁeld. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research.

🔖 News!!!

📌 We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to contact us.

📰 2024-03-06: We establish a template for paper submissions. This template is accessible by navigating to the New Issue button within Issues or by clicking here. Once there, please select the Paper Submission Form and complete it following the guidelines provided.

📰 2024-02-28: Our comprehensive survey paper, summarizing related methods published before February 1, 2024, is now available!

🔍 BibTeX

@article{huang2024diffusion,
  title={Diffusion Model-Based Image Editing: A Survey},
  author={Huang, Yi and Huang, Jiancheng and Liu, Yifan and Yan, Mingfu and Lv, Jiaxi and Liu, Jianzhuang and Xiong, Wei and Zhang, He and Chen, Shifeng and Cao, Liangliang},
  journal={arXiv preprint arXiv:2402.17525},
  year={2024}
}

Papers

Papers

Training-Based

Training-Based: Domain-Specific Editing with Weak Supervision

Title	Pub	Release Date
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation	NeurIPS 2023	2023.10
Stylediffusion: Controllable disentangled style transfer via diffusion models	ICCV 2023	2023.08
Hierarchical diffusion autoencoders and disentangled image manipulation	WACV 2024	2023.04
Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models	arXiv 2023	2023.04
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models	CVPR 2023 WORKSHOP	2022.12
Diffstyler: Controllable dual diffusion for text-driven image stylization	TNNLS 2024	2022.11
Diffusion Models Already Have A Semantic Latent Space	ICLR 2022	2022.10
Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations	NeurIPS 2022	2022.07
Diffusion autoencoders: Toward a meaningful and decodable representation	CVPR 2022	2021.11
Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models	arXiv 2021	2021.04
Diffusionclip: Text-guided diffusion models for robust image manipulation	CVPR 2022	2021.01

Training-Based: Reference and Attribute Guidance via Self-Supervision

Title	Pub	Release Date
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control	arXiv 2023	2023.12
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting	arXiv 2023	2023.12
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models	arXiv 2023	2023.12
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model	ACM MM 2023	2023.10
Face Aging via Diffusion-based Editing	BMVC 2023	2023.09
Anydoor: Zero-shot object-level image customization	arXiv 2023	2023.07
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model	arXiv 2023	2023.06
Text-to-image editing by image information removal	WACV 2024	2023.05
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model	arXiv 2023	2023.04
Pair-diffusion: Object-level image editing with structure-and-appearance paired diffusion models	arXiv 2023	2023.03
Imagen editor and editbench: Advancing and evaluating text-guided image inpainting	CVPR 2023	2022.12
Smartbrush: Text and shape guided object inpainting with diffusion model	CVPR 2023	2022.12
ObjectStitch: Object Compositing With Diffusion Model	CVPR 2023	2022.12
Paint by example: Exemplar-based image editing with diffusion models	CVPR 2023	2022.11

Training-Based: Instructional Editing via Full Supervision

Title	Pub	Release Date
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models	CVPR 2024	2023.12
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following	arXiv 2023	2023.12
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation	CVPR 2024	2023.12
Emu edit: Precise image editing via recognition and generation tasks	arXiv 2023	2023.11
Guiding instruction-based image editing via multimodal large language models	arXiv 2023	2023.09
Instructdiffusion: A generalist modeling interface for vision tasks	arXiv 2023	2023.09
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers	arXiv 2023	2023.09
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation	NeurIPS 2023	2023.08
Inst-Inpaint: Instructing to Remove Objects with Diffusion Models	arXiv 2023	2023.04
HIVE: Harnessing Human Feedback for Instructional Visual Editing	arXiv 2023	2023.03
DialogPaint: A Dialog-based Image Editing Model	arXiv 2023	2023.01
Learning to Follow Object-Centric Image Editing Instructions Faithfully	ACL 2023	2023.01
Instructpix2pix: Learning to follow image editing instructions	CVPR 2023	2022.11

Training-Based: Pseudo-Target Retrieval with Weak Supervision

Title	Pub	Release Date
Text-Driven Image Editing via Learnable Regions	arXiv 2023	2023.11
iEdit: Localised Text-guided Image Editing with Weak Supervision	arXiv 2023	2023.05
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation	arXiv 2023	2023.05

Testing-Time Finetuning

Testing-Time Finetuning: Denosing Model Finetuning

Title	Pub	Release Date
Kv inversion: Kv embeddings learning for text-conditioned real image action editing	arXiv 2023	2023.09
Custom-edit: Text-guided image editing with customized diffusion models	arXiv 2023	2023.05
Unitune: Text-driven image editing by fine tuning an image generation model on a single image	arXiv 2022	2022.10

Testing-Time Finetuning: Embeddings Finetuning

Title	Pub	Release Date
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing	NeurIPS 2023	2023.09
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models	ICCV 2023	2023.05
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models	CVPR 2023	2022.12
Null-text inversion for editing real images using guided diffusion models	CVPR 2023	2022.11

Testing-Time Finetuning: Guidance with Hypernetworks

Title	Pub	Release Date
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing	arXiv 2023	2023.05
Inversion-based creativity transfer with diffusion models	CVPR 2023	2022.11

Testing-Time Finetuning: Latent Variable Optimization

Title	Pub	Release Date
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing	arXiv 2023	2023.11
MagicRemover: Tuning-free Text-guided Image inpainting with Diffusion Models	arXiv 2023	2023.10
Dragondiffusion: Enabling drag-style manipulation on diffusion models	arXiv 2023	2023.07
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing	arXiv 2023	2023.06
Delta denoising score	ICCV 2023	2023.04
Diffusion-based Image Translation using disentangled style and content representation	ICLR 2022	2022.09

Testing-Time Finetuning: Hybrid Finetuning

Title	Pub	Release Date
Forgedit: Text Guided Image Editing via Learning and Forgetting	arXiv 2023	2023.09
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models	arXiv 2023	2023.05
Sine: Single image editing with text-to-image diffusion models	CVPR 2023	2022.12
Imagic: Text-Based Real Image Editing With Diffusion Models	CVPR 2023	2022.10

Training and Finetuning Free

Training and Finetuning Free: Input Text Refinement

Title	Pub	Release Date
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques	arXiv 2023	2023.06
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation	arXiv 2023	2023.05
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions	arXiv 2023	2023.05
Preditor: Text guided image editing with diffusion prior	arXiv 2023	2023.02

Training and Finetuning Free: Inversion/Sampling Modification

Title	Pub	Release Date
Inversion-Free Image Editing with Natural Language	CVPR 2024	2023.12
Fixed-point Inversion for Text-to-image diffusion models	arXiv 2023	2023.12
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing	arXiv 2023	2023.12
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing	arXiv 2023	2023.11
LEDITS++: Limitless Image Editing using Text-to-Image Models	arXiv 2023	2023.11
A latent space of stochastic diffusion models for zero-shot image editing and guidance	ICCV 2023	2023.10
Effective real image editing with accelerated iterative diffusion inversion	ICCV 2023	2023.09
Fec: Three finetuning-free methods to enhance consistency for real image editing	arXiv 2023	2023.09
Iterative multi-granular image editing using diffusion models	arXiv 2024	2023.09
ProxEdit: Improving Tuning-Free Real Image Editing With Proximal Guidance	WACV 2024	2023.06
Diffusion self-guidance for controllable image generation	arXiv 2023	2023.06
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images	arXiv 2023	2023.06
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models	arXiv 2023	2023.05
An Edit Friendly DDPM Noise Space: Inversion and Manipulations	arXiv 2023	2023.04
Training-Free Content Injection Using H-Space in Diffusion Models	WACV 2024	2023.03
Edict: Exact diffusion inversion via coupled transformations	CVPR 2023	2022.11
Direct inversion: Optimization-free text-driven real image editing with diffusion models	arXiv 2022	2022.11

Training and Finetuning Free: Attention Modification

Title	Pub	Release Date
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models	arXiv 2023	2023.12
Tf-icon: Diffusion-based training-free cross-domain image composition	ICCV 2023	2023.07
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models	NeurIPS 2023	2023.06
Conditional Score Guidance for Text-Driven Image-to-Image Translation	NeurIPS 2023	2023.05
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing	arXiv 2023	2023.04
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models	ICCV 2023	2023.03
Zero-shot image-to-image translation	ACM SIGGRAPH 2023	2023.02
Shape-Guided Diffusion With Inside-Outside Attention	WACV 2024	2022.12
Plug-and-play diffusion features for text-driven image-to-image translation	CVPR 2023	2022.11
Prompt-to-prompt image editing with cross attention control	ICLR	2022.08

Training and Finetuning Free: Mask Guidance

Title	Pub	Release Date
ZONE: Zero-Shot Instruction-Guided Local Editing	CVPR 2024	2023.12
Watch your steps: Local image and scene editing by text instructions	arXiv 2023	2023.08
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models	NeurIPS 2023	2023.06
Differential Diffusion: Giving Each Pixel Its Strength	arXiv 2023	2023.06
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing	arXiv 2023	2023.06
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference	AAAI 2023	2023.05
Inpaint anything: Segment anything meets image inpainting	arXiv 2023	2023.04
Region-aware diffusion for zero-shot text-driven image editing	CVM 2023	2023.02
Text-guided mask-free local image retouching	ICME 2023	2022.12
Blended diffusion for text-driven editing of natural images	CVPR 2022	2021.11
DiffEdit: Diffusion-based semantic image editing with mask guidance	ICLR	2022.10
Blended latent diffusion	SIGGRAPH 2023	2022.06

Training and Finetuning Free: Multi-Noise Redirection

Title	Pub	Release Date
Object-aware Inversion and Reassembly for Image Editing	arXiv 2023	2023.10
Ledits: Real image editing with ddpm inversion and semantic guidance	arXiv 2023	2023.07
Sega: Instructing diffusion using semantic dimensions	arXiv 2023	2023.01
The stable artist: Steering semantics in diffusion latent space	arXiv 2022	2022.12

derekluo/Awesome-Diffusion-Model-Based-Image-Editing-Methods