This is the official PyTorch implementation of the paper ''Region-Aware Diffusion for Zero-shot Text-driven Image Editing''.
Hugging Face (https://huggingface.co/spaces/alvanlii/RDM-Region-Aware-Diffusion-Model). Thanks for Alvanlii~
Image manipulation under the guidance of textual descriptions has recently received a broad range of attention. In this study, we focus on the regional editing of images with the guidance of given text prompts. Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts. To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline by combing latent space diffusion and enhanced directional guidance. In addition, to preserve image content in non-edited regions, we introduce regional-aware entity editing to modify the region of interest and preserve the out-of-interest region. We validate the proposed RDM beyond the baseline methods through extensive qualitative and quantitative experiments. The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency.
git clone https://github.com/haha-lisa/RDM-Region-Aware-Diffusion-Model
cd RDM-Region-Aware-Diffusion-Model
pip install -e .
And install latent diffusion
bert, kl-f8, diffusion
Please download them and put them into the folder ./
python run_edit.py --edit ./input_image/flower1.jpg --region ./input_image/flower1_region.png \
-fp "a flower" --batch_size 6 --num_batches 2 \
--text "a chrysanthemum" --prefix "test_flower"
If you find our work is useful in your research, please consider citing:
@article{huang2023region,
title={Region-aware diffusion for zero-shot text-driven image editing},
author={Huang, Nisha and Tang, Fan and Dong, Weiming and Lee, Tong-Yee and Xu, Changsheng},
journal={arXiv preprint arXiv:2302.11797},
year={2023}
}
The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file.