The repository is based on our survey Diffusion Model-Based Image Editing: A Survey.
Yi Huang*, Jiancheng Huang*, Yifan Liu*, Mingfu Yan*, Jiaxi Lv*, Jianzhuang Liu*, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen
Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Adobe Inc, Apple Inc, Southern University of Science and Technology (SUSTech)
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the ๏ฌeld. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research.
๐ We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to contact us.
๐ฐ 2024-03-06: We establish a template for paper submissions. This template is accessible by navigating to the New Issue
button within Issues
or by clicking here. Once there, please select the Paper Submission Form
and complete it following the guidelines provided.
๐ฐ 2024-02-28: Our comprehensive survey paper, summarizing related methods published before February 1, 2024, is now available!
@article{huang2024diffusion,
title={Diffusion Model-Based Image Editing: A Survey},
author={Huang, Yi and Huang, Jiancheng and Liu, Yifan and Yan, Mingfu and Lv, Jiaxi and Liu, Jianzhuang and Xiong, Wei and Zhang, He and Chen, Shifeng and Cao, Liangliang},
journal={arXiv preprint arXiv:2402.17525},
year={2024}
}
Title | Pub | Release Date |
---|---|---|
Text-Driven Image Editing via Learnable Regions | arXiv 2023 | 2023.11 |
iEdit: Localised Text-guided Image Editing with Weak Supervision | arXiv 2023 | 2023.05 |
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation | arXiv 2023 | 2023.05 |
Title | Pub | Release Date |
---|---|---|
Kv inversion: Kv embeddings learning for text-conditioned real image action editing | arXiv 2023 | 2023.09 |
Custom-edit: Text-guided image editing with customized diffusion models | arXiv 2023 | 2023.05 |
Unitune: Text-driven image editing by fine tuning an image generation model on a single image | arXiv 2022 | 2022.10 |
Title | Pub | Release Date |
---|---|---|
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing | NeurIPS 2023 | 2023.09 |
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models | ICCV 2023 | 2023.05 |
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models | CVPR 2023 | 2022.12 |
Null-text inversion for editing real images using guided diffusion models | CVPR 2023 | 2022.11 |
Title | Pub | Release Date |
---|---|---|
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing | arXiv 2023 | 2023.05 |
Inversion-based creativity transfer with diffusion models | CVPR 2023 | 2022.11 |
Title | Pub | Release Date |
---|---|---|
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing | arXiv 2023 | 2023.11 |
MagicRemover: Tuning-free Text-guided Image inpainting with Diffusion Models | arXiv 2023 | 2023.10 |
Dragondiffusion: Enabling drag-style manipulation on diffusion models | arXiv 2023 | 2023.07 |
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing | arXiv 2023 | 2023.06 |
Delta denoising score | ICCV 2023 | 2023.04 |
Diffusion-based Image Translation using disentangled style and content representation | ICLR 2022 | 2022.09 |
Title | Pub | Release Date |
---|---|---|
Forgedit: Text Guided Image Editing via Learning and Forgetting | arXiv 2023 | 2023.09 |
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models | arXiv 2023 | 2023.05 |
Sine: Single image editing with text-to-image diffusion models | CVPR 2023 | 2022.12 |
Imagic: Text-Based Real Image Editing With Diffusion Models | CVPR 2023 | 2022.10 |
Title | Pub | Release Date |
---|---|---|
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques | arXiv 2023 | 2023.06 |
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation | arXiv 2023 | 2023.05 |
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions | arXiv 2023 | 2023.05 |
Preditor: Text guided image editing with diffusion prior | arXiv 2023 | 2023.02 |
Title | Pub | Release Date |
---|---|---|
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | arXiv 2023 | 2023.12 |
Tf-icon: Diffusion-based training-free cross-domain image composition | ICCV 2023 | 2023.07 |
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models | NeurIPS 2023 | 2023.06 |
Conditional Score Guidance for Text-Driven Image-to-Image Translation | NeurIPS 2023 | 2023.05 |
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing | arXiv 2023 | 2023.04 |
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models | ICCV 2023 | 2023.03 |
Zero-shot image-to-image translation | ACM SIGGRAPH 2023 | 2023.02 |
Shape-Guided Diffusion With Inside-Outside Attention | WACV 2024 | 2022.12 |
Plug-and-play diffusion features for text-driven image-to-image translation | CVPR 2023 | 2022.11 |
Prompt-to-prompt image editing with cross attention control | ICLR | 2022.08 |
Title | Pub | Release Date |
---|---|---|
ZONE: Zero-Shot Instruction-Guided Local Editing | CVPR 2024 | 2023.12 |
Watch your steps: Local image and scene editing by text instructions | arXiv 2023 | 2023.08 |
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models | NeurIPS 2023 | 2023.06 |
Differential Diffusion: Giving Each Pixel Its Strength | arXiv 2023 | 2023.06 |
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing | arXiv 2023 | 2023.06 |
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference | AAAI 2023 | 2023.05 |
Inpaint anything: Segment anything meets image inpainting | arXiv 2023 | 2023.04 |
Region-aware diffusion for zero-shot text-driven image editing | CVM 2023 | 2023.02 |
Text-guided mask-free local image retouching | ICME 2023 | 2022.12 |
Blended diffusion for text-driven editing of natural images | CVPR 2022 | 2021.11 |
DiffEdit: Diffusion-based semantic image editing with mask guidance | ICLR | 2022.10 |
Blended latent diffusion | SIGGRAPH 2023 | 2022.06 |
Title | Pub | Release Date |
---|---|---|
Object-aware Inversion and Reassembly for Image Editing | arXiv 2023 | 2023.10 |
Ledits: Real image editing with ddpm inversion and semantic guidance | arXiv 2023 | 2023.07 |
Sega: Instructing diffusion using semantic dimensions | arXiv 2023 | 2023.01 |
The stable artist: Steering semantics in diffusion latent space | arXiv 2022 | 2022.12 |