Introduction

Built upon GPT-4V(ision), Idea2Img is a multimodal iterative self-refinement system that enhances any T2I model for automatic image design and generation, enabling various new image creation functionalities togther with better visual qualities.

Prerequisites

Obtain the public OpenAI GPT-4V API key and setup T2I inference accordingly, e.g., SDXL.

Installation

Clone the repository

git clone https://github.com/zyang-ur/idea2img.git

Running

Inference prompts will be read from --testfile. <IMG> is a separator token inserted between image-image and image-text.

mkdir output
python idea2img_pipeline.py --api_key OAI_GPT4V_Key --testfile testsample.txt --fewshot --select_fewshot

Results

Generated results and intermediate steps will be saved to output folder.

Citation

@article{yang2023idea2img,
  title={Idea2img: Iterative self-refinement with gpt-4v (ision) for automatic image design and generation},
  author={Yang, Zhengyuan and Wang, Jianfeng and Li, Linjie and Lin, Kevin and Lin, Chung-Ching and Liu, Zicheng and Wang, Lijuan},
  journal={arXiv preprint arXiv:2310.08541},
  year={2023}
}