The focus of this fork of Stable Diffusion is to improve:
- Workflow
- Sensible GUI with shortcuts and launch arguments at your fingertips
- Quicker iteration times (model stays in-memory, among other improvements)
- Determinism
- Enhanced determinism of seeding per-sample, singly-indexable subsamples
- Sequential seed stepping per iteration
- Usability
- Accessibility through UI
- More concise and fool-proof installation
Some things to note which should be common sense, but by downloading these files you hereby agree:
- I'm not responsible in any way for what you choose to generate with this, nor do I own any such product.
- Beware, certain model weights you download from various sources may be capable of producing disturbing imagery.
- You must read and understand the relevant license of any model you use.
- This is a hacked-together work-in-progress.
- Expect things to break. Frequently.
- Be kind and collaborative in discussions online about Stable Diffusion, and other similar tools.
- Modern NVIDIA GPU with > 10GB VRAM
Install Miniconda3, a minimized Python virtual environment manager for Windows.
From the Start Menu, run Anaconda Prompt (miniconda3). This is the shell that you should execute all further shell commands within.
Create the environment (automatically fetches dependencies) from the base of this repository:
conda env create -f environment.yaml
conda activate ldm
Place your obtained model.ckpt
file in new folder: .\stable-diffusion-improvements\models\ldm\stable-diffusion-v1
If it's not named model.ckpt
, and is instead something like sd-v1-4.ckpt
, just rename it.
Using the interactive mode is straightforward. Simply call (with ldm
conda env active):
python scripts\txt2img.py --interactive
Any classic arguments passed to txt2img.py will show up in the interactive view as parameter textbox/checkbox defaults. Await the activation of the Generate
button, and create to your heart's content.
More info and features coming soon, so keep your repository up to date.
From Stability.ai:
Stable Diffusion was made possible thanks to a collaboration with Stability AI and Runway and builds upon our previous work:
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*,
Andreas Blattmann*,
Dominik Lorenz,
Patrick Esser,
Björn Ommer
CVPR '22 Oral
which is available on GitHub. PDF at arXiv. Please also visit our Project page.
Stable Diffusion is a latent text-to-image diffusion
model.
Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database.
Similar to Google's Imagen,
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See this section below and the model card.