/sd-webui-rpg-diffusionmaster

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

Note:

This extension is Not Being Actively Developed due to a shift in my personal focus and interests. Besides, there was no feature change in the original RPG-DiffusionMaster project recently.

RPG-DiffusionMaster Extension for Stable Diffusion WebUI

This repository hosts an extension for Stable Diffusion WebUI that integrates the functionalities of RPG-DiffusionMaster. It brings additional changes and enhancements, enabling users of WebUI to interact with RPG-DiffusionMaster more seamlessly.

For more information, check the official repo or the following paper:

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui Affiliations: Peking University, Stanford University, Pika Labs

Introduction

Currently in an early phase of development, this extension employs LLMs (such as GPT4, Gemini Pro) for regional planning. It communicates the split ratios and regional prompts generated from LLMs to Regional Prompter for image generation, similar to the official repository.

Installation

Prior to installing this extension, ensure that the Regional Prompter extension is already set up on your system. This extension has not yet been added to the WebUI extensions index, and hence must be installed manually using the URL on the WebUI extension tab. installation

Usage

  1. Navigate to the txt2img tab.
  2. Choose RPG DiffusionMaster from the Script dropdown menu. dropdown_
  3. Select your desired LLM and configure the settings for RPG-DiffusionMaster. config_
  4. Press the "Apply to Prompt" button and wait briefly as the extension processes the prompt through the LLM and adjusts the Regional Prompter configurations accordingly.
  5. Review the adjusted settings and the final prompt in the Prompt textbox. You can then modify parameters like image size, CFG Scale, Steps, etc., before generating your images.

To-Do List 💪

  • Integrate local LLM support.

Differences from the Official Implementation

  • Adds support for the OpenAI Azure GPT4 Model and Gemini Pro.
  • Alters the logic to enhance stability when extracting regional prompts.

Acknowledgements

A huge thank you to Ling Yang for the foundational RPG-DiffusionMaster implementation, AUTOMATIC1111, and regional-prompter for their exceptional contributions and codebases.