Attention Refocusing

[Website][Demo]

This is the official implementation of the paper "Grounded Text-to-Image Synthesis with Attention Refocusing"

intro_small.mp4

Setup

conda create --name ldm_layout python==3.8.0
conda activate ldm_layout
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
pip install git+https://github.com/CompVis/taming-transformers.git
pip install git+https://github.com/openai/CLIP.git

Inference

Teaser figure

Download the model GLIGEN and put them in gligen_checkpoints

Run with the prompts in HRS/Drawbench prompts :

python guide_gligen.py --ckpt [model_checkpoint]  --file_save [save_path] \
                       --type [category] --box_pickle [saved_boxes] --use_gpt4

Where

  • --ckpt: Path to the GLIGEN checkpoint
  • --file_save: Path to save the generated images
  • --type: The category to test (options include counting, spatial, color, size)
  • --box_pickle: Path to save the generated layout from GPT-4
  • --use_gpt4: Whether to use GPT-4 to generate the layout. If you're using GPT-4, set your GPT-4 API key as follows:
export OPENAI_API_KEY='your-api-key'

For instance, to generate images according to the layouts and prompts of the counting category:

python guide_gligen.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin --file_save counting_500 \
                       --type counting --box_pickle ../data_evaluate_LLM/gpt_generated_box/counting.p

To run with user input text prompts:

export OPENAI_API_KEY='your-api-key'
python inference.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin

We provide generated layout from GPT4 for HRS benchmark in the HRS boxes, DrawBench boxes
We also provide generated images from GLIGEN, and other baselines including Stable Diffusion, Attend-and-excite, MultiDiffusion, Layout-guidance, GLIGEN and ours here

Evaluation

Set up the environment, download detector models, and run evaluation for each category, see the evaluation.

Attention-refocusing with other baselines

ControlNet + attention-refocusing

Acknowledgments

This project is built on the following resources:

  • GLIGEN: Our code is built upon the foundational work provided by GLIGEN.

  • HRS: The evaluation component of our project has been adopted from HRS.