/stable-diffusion-xl-demo

A gradio web UI demo for Stable Diffusion XL 1.0, with refiner and MultiGPU support

Primary LanguagePython

title emoji colorFrom colorTo sdk sdk_version app_file pinned license
Stable Diffusion XL 1.0
🔥
yellow
gray
gradio
3.11.0
app.py
true
mit

StableDiffusion XL Gradio Demo WebUI

This is a gradio demo with web ui supporting Stable Diffusion XL 1.0. This demo loads the base and the refiner model.

This is forked from StableDiffusion v2.1 Demo WebUI. Refer to the git commits to see the changes.

Update 🔥🔥🔥: Latent consistency models (LCM) LoRA is supported and enabled by default (controlled by ENABLE_LCM)! Turn on USE_SSD to use SSD-1B for a even faster generation (4.9 sec/image on free colab T4 without additional optimizations)! Colab has been updated to use this by default. Open In Colab

Update 🔥🔥🔥: Check out our work LLM-grounded Diffusion (LMD), which introduces LLMs into the diffusion world and achieves much better prompt understanding compared to the standard Stable Diffusion without any fine-tuning! LMD with SDXL is supported on our Github repo and a demo with SD is available.

Update: SDXL 1.0 is released and our Web UI demo supports it! No application is needed to get the weights! Launch the colab to get started. You can run this demo on Colab for free even on T4. Open In Colab

Update: Multiple GPUs are supported. You can easily spread the workload to different GPUs by setting MULTI_GPU=True. This uses data parallelism to split the workload to different GPUs.

SDXL with SSD-1B, LCM LoRA

Examples

Update: See a more comprehensive comparison with 1200+ images here. Both SD XL and SD v2.1 are benchmarked on prompts from StableStudio.

Left: SDXL. Right: SD v2.1.

Without any tuning, SDXL generates much better images compared to SD v2.1!

Example 1

Example 2

Example 3

Example 4

Example 5

Installation

With torch 2.0.1 installed, we also need to install:

pip install accelerate transformers invisible-watermark "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25" safetensors "gradio==3.11.0"
pip install git+https://github.com/huggingface/diffusers.git

Launching

It's free and no form is needed now. Leaked weights seem to be available on reddit, but I have not used/tested them.

There are two ways to load the weights. Option 1 works out of the box (no need for manual download). If you prefer loading from local repo, you can use Option 2.

Option 1

Run the command to automatically set up the weights:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python app.py

Option 1

If you have cloned both repo (base, refiner) locally (please change the path_to_sdxl):

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=/path_to_sdxl python app.py

Note that stable-diffusion-xl-base-1.0 and stable-diffusion-xl-refiner-1.0 should be placed in a directory. The path of the directory should replace /path_to_sdxl.

torch.compile support

Turn on torch.compile will make overall inference faster. However, this will add some overhead to the first run (i.e., have to wait for compilation during the first run).

To save memory

  1. Turn on pipe.enable_model_cpu_offload() and turn off pipe.to("cuda") in app.py.
  2. Turn off refiner by setting enable_refiner to False.
  3. More ways to save memory and make things faster.

Several options through environment variables

  • USE_SSD: use segmind/SSD-1B. This is a distilled SDXL model that is faster. This is disabled by default.
  • ENABLE_LCM: use LCM LoRA. This is enabled by default.
  • SDXL_MODEL_DIR: load SDXL locally.
  • ENABLE_REFINER=true/false turn on/off the refiner (refiner refines the generation). The refiner is disabled by default if LCM LoRA or SSD model is enabled.
  • OFFLOAD_BASE and OFFLOAD_REFINER can be set to true/false to enable/disable model offloading (model offloading saves memory at the cost of slowing down generation).
  • OUTPUT_IMAGES_BEFORE_REFINER=true/false useful is refiner is enabled. Output images before and after the refiner stage.
  • SHARE=true/false creates public link (useful for sharing and on colab)
  • MULTI_GPU=true/false enables data parallelism on multi gpus.

If you enjoy this demo, please give this repo a star ⭐.