/Learnable_Regions

[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"

Primary LanguagePythonMIT LicenseMIT

Text-Driven Image Editing via Learnable Regions
(CVPR 2024)

♥️ If you find our project is helpful for your research, please kindly give us a 🌟 and cite our paper 📑 : )

arXiv Open In Colab

Paper | Project Page | Youtube Video

Official implementation of "Text-Driven Image Editing via Learnable Regions"

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang

Abstract: Language has emerged as a natural interface for image editing. In this paper, we introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. Specifically, our approach leverages an existing pre-trained text-to-image model and introduces a bounding box generator to find the edit regions that are aligned with the textual prompts. We show that this simple approach enables flexible editing that is compatible with current image generation models, and is able to handle complex prompts featuring multiple objects, complex sentences, or long paragraphs. We conduct an extensive user study to compare our method against state-of-the-art methods. Experiments demonstrate the compet- itive performance of our method in manipulating images with high fidelity and realism that align with the language descriptions provided. Our project webpage: https://yuanze-lin.me/LearnableRegions_page.

image

Method Overview

image

News

  • [2024.8.16] Release a demo on Colab and have fun playing with it 🎨.
  • [2024.8.15] Code has been released.

Contents

Getting Started

🛠️ Environment Installation

To establish the environment, just run this code in the shell:

git clone https://github.com/yuanze-lin/Learnable_Regions.git
cd Learnable_Regions
conda create -n LearnableRegion python==3.9 -y
source activate LearnableRegion
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
conda env update --file enviroment.yaml

That will create the environment LearnableRegion we used.

🎩 Edit Single Image

Run the following command to start editing a single image.

torchrun --nnodes=1 --nproc_per_node=1 train.py \
	--image_file_path images/1.png \
	--image_caption 'trees' \
	--editing_prompt 'a big tree with many flowers in the center' \
	--output_dir output/ \
	--draw_box \
	--lr 5e-3 \
	--max_window_size 15 \
	--per_image_iteration 10 \
	--epochs 1 \
	--num_workers 8 \
	--seed 42 \
	--pin_mem \
	--point_number 9 \
	--batch_size 1 \
	--save_path checkpoints/

The editing results will be stored in $output_dir, and the whole editing time of one single image is about 4 minutes with 1 RTX 8000 GPU.

You can tune max_window_size, per_image_iteration and point_number for adjusting the editing time and performance.

The explanation for the introduced hyper-parameters from our method:

"image_caption": the caption of the input image, we just use class name in our paper.
"editing_prompt": the editing prompt for manipulating the input image.
"max_window_size": max anchor bounding box size.
"per_image_iteration": training iterations for each image.
"point_number": number of sampled anchor points.
"draw_box": whether to draw bounding boxes of results for visualization or not, it will be saved into $output_dir/boxes.

👾 Edit Multiple Images

Run the following command to start editing multiple images simultaneously.

torchrun --nnodes=1 --nproc_per_node=2 train.py \
	--image_dir_path images/ \
	--output_dir output/ \
	--json_file images.json \
	--draw_box \
	--lr 5e-3 \
	--max_window_size 15 \
	--per_image_iteration 10 \
	--epochs 1 \
	--num_workers 8 \
	--seed 42 \
	--pin_mem \
	--point_number 9 \
	--batch_size 1 \
	--save_path checkpoints/ 

☕ How to Edit Custom Images?

Edit single custom image: please refer to the command from Edit Single Image, and change image_file_path, image_caption, editing_prompt accordingly.

Edit multiple custom images: please refer to images.json to prepare the structure. Each key represents the input image's name, the values are class/caption of the input image and editing prompt respectively, and then just run the above command from Edit Multiple Images.

Results Using Diverse Prompts

image

Additional Results

image

Citation

If you find our work useful in your research or applications, please consider citing our paper using the following BibTeX:

@inproceedings{lin2024text,
  title={Text-driven image editing via learnable regions},
  author={Lin, Yuanze and Chen, Yi-Wen and Tsai, Yi-Hsuan and Jiang, Lu and Yang, Ming-Hsuan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7059--7068},
  year={2024}
}