The repository is based on our survey Diffusion Model-Based Image Editing: A Survey.
Yi Huang*, Jiancheng Huang*, Yifan Liu*, Mingfu Yan*, Jiaxi Lv*, Jianzhuang Liu*, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen
Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Adobe Inc, Apple Inc, Southern University of Science and Technology (SUSTech)
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the ๏ฌeld. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research.
๐ We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to contact us.
๐ฐ 2024-03-22: The template of computing LMM Score using GPT-4V, along with a corresponding leaderboard comparing several leading methods, is released.
๐ฐ 2024-03-14: Our benchmark EditEval_v1 is now released.
๐ฐ 2024-03-06: We establish a template for paper submissions. This template is accessible by navigating to the New Issue
button within Issues
or by clicking here. Once there, please select the Paper Submission Form
and complete it following the guidelines provided.
๐ฐ 2024-02-28: Our comprehensive survey paper, summarizing related methods published before February 1, 2024, is now available.
If you find this work helpful in your research, welcome to cite the paper and give a โญ.
@article{huang2024diffusion,
title={Diffusion Model-Based Image Editing: A Survey},
author={Huang, Yi and Huang, Jiancheng and Liu, Yifan and Yan, Mingfu and Lv, Jiaxi and Liu, Jianzhuang and Xiong, Wei and Zhang, He and Chen, Shifeng and Cao, Liangliang},
journal={arXiv preprint arXiv:2402.17525},
year={2024}
}
Title | Publication | Date |
---|---|---|
Text-Driven Image Editing via Learnable Regions | CVPR 2024 | 2023.11 |
iEdit: Localised Text-guided Image Editing with Weak Supervision | arXiv 2023 | 2023.05 |
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation | arXiv 2023 | 2023.05 |
Title | Publication | Date |
---|---|---|
Kv inversion: Kv embeddings learning for text-conditioned real image action editing | arXiv 2023 | 2023.09 |
Custom-edit: Text-guided image editing with customized diffusion models | CVPR workshop 2023 | 2023.05 |
Unitune: Text-driven image editing by fine tuning an image generation model on a single image | ACM TOG 2023 | 2022.10 |
Title | Publication | Date |
---|---|---|
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing | NeurIPS 2023 | 2023.09 |
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models | ICCV 2023 | 2023.05 |
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models | CVPR 2023 | 2022.12 |
Null-text inversion for editing real images using guided diffusion models | CVPR 2023 | 2022.11 |
Title | Publication | Date |
---|---|---|
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing | arXiv 2023 | 2023.05 |
Inversion-based creativity transfer with diffusion models | CVPR 2023 | 2022.11 |
Title | Publication | Date |
---|---|---|
StableDrag: Stable Dragging for Point-based Image Editing | arXiv 2024 | 2024.03 |
FreeDrag: Feature Dragging for Reliable Point-based Image Editing | CVPR 2024 | 2023.12 |
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing | CVPR 2024 | 2023.11 |
MagicRemover: Tuning-free Text-guided Image inpainting with Diffusion Models | arXiv 2023 | 2023.10 |
Dragondiffusion: Enabling drag-style manipulation on diffusion models | ICLR 2024 | 2023.07 |
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing | CVPR 2024 | 2023.06 |
Delta denoising score | ICCV 2023 | 2023.04 |
Directed Diffusion: Direct Control of Object Placement through Attention Guidance | AAAI 2024 | 2023.02 |
Diffusion-based Image Translation using disentangled style and content representation | ICLR 2022 | 2022.09 |
Title | Publication | Date |
---|---|---|
Forgedit: Text Guided Image Editing via Learning and Forgetting | arXiv 2023 | 2023.09 |
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models | arXiv 2023 | 2023.05 |
Sine: Single image editing with text-to-image diffusion models | CVPR 2023 | 2022.12 |
Imagic: Text-Based Real Image Editing With Diffusion Models | CVPR 2023 | 2022.10 |
Title | Publication | Date |
---|---|---|
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques | arXiv 2023 | 2023.06 |
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation | arXiv 2023 | 2023.05 |
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions | arXiv 2023 | 2023.05 |
Preditor: Text guided image editing with diffusion prior | arXiv 2023 | 2023.02 |
Title | Publication | Date |
---|---|---|
ZONE: Zero-Shot Instruction-Guided Local Editing | CVPR 2024 | 2023.12 |
Watch your steps: Local image and scene editing by text instructions | arXiv 2023 | 2023.08 |
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models | NeurIPS 2023 | 2023.06 |
Differential Diffusion: Giving Each Pixel Its Strength | arXiv 2023 | 2023.06 |
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing | arXiv 2023 | 2023.06 |
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference | AAAI 2024 | 2023.05 |
Inpaint anything: Segment anything meets image inpainting | arXiv 2023 | 2023.04 |
Region-aware diffusion for zero-shot text-driven image editing | CVM 2023 | 2023.02 |
Text-guided mask-free local image retouching | ICME 2023 | 2022.12 |
Blended diffusion for text-driven editing of natural images | CVPR 2022 | 2021.11 |
DiffEdit: Diffusion-based semantic image editing with mask guidance | ICLR 2023 | 2022.10 |
Blended latent diffusion | SIGGRAPH 2023 | 2022.06 |
Title | Publication | Date |
---|---|---|
Object-aware Inversion and Reassembly for Image Editing | ICLR 2024 | 2023.10 |
Ledits: Real image editing with ddpm inversion and semantic guidance | arXiv 2023 | 2023.07 |
Sega: Instructing diffusion using semantic dimensions | NeurIPS 2023 | 2023.01 |
The stable artist: Steering semantics in diffusion latent space | arXiv 2022 | 2022.12 |
EditEval_v1 is a benchmark tailored for evaluation of general diffusion-model based image editing algorithms. It contains 50 high-quality images selected from Unsplash, each accompanied by a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. This benchmark covers seven most popular specific editing tasks across semantic, stylistic and structural editing defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!
To facilitate a user-friendly application of LMM Score, here we provide a comprehensive template for its implementation in GPT-4V. This template comes with step-by-step instructions and all required materials, making it easy for users to apply. Additionally, we construct a leaderboard comparing various representative methods evaluated using LMM Score on our EditEval_v1 benchmark, which can be found here.