sd-webui-controlnet: A Python repository from yj7082126

sd-webui-controlnet

(WIP) WebUI extension for ControlNet and other injection-based SD controls.

This extension is for AUTOMATIC1111's Stable Diffusion web UI, allows the Web UI to add ControlNet to the original Stable Diffusion model to generate images. The addition is on-the-fly, the merging is not required.

ControlNet is a neural network structure to control diffusion models by adding extra conditions.

Thanks & Inspired by: kohya-ss/sd-webui-additional-networks

Install

Open "Extensions" tab.
Open "Install from URL" tab in the tab.
Enter https://github.com/Mikubill/sd-webui-controlnet.git to "URL for extension's git repository".
Press "Install" button.
Wait 5 seconds, and you will see the message "Installed into stable-diffusion-webui\extensions\sd-webui-controlnet. Use Installed tab to restart".
Go to "Installed" tab, click "Check for updates", and then click "Apply and restart UI". (The next time you can also use this method to update ControlNet.)
Completely restart A1111 webui including your terminal. (If you do not know what is a "terminal", you can reboot your computer: turn your computer off and turn it on again.)
Download models (see below).
After you put models in the correct folder, you may need to refresh to see the models. The refresh button is right to your "Model" dropdown.

Download Models

Right now all the 14 models of ControlNet 1.1 are in the beta test. Here is the discussion and bug report.

Download the models from ControlNet 1.1: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main

You need to download model files ending with ".pth" .

Put models in your "stable-diffusion-webui\extensions\sd-webui-controlnet\models". Now we have already included all "yaml" files. You only need to download "pth" files.

Note: If you download models elsewhere, please make sure that yaml file names and model files names are same. Please manually rename all yaml files if you download from other sources. Otherwise, models may have unexpected behaviors. You can ignore this if you download models from official sources.

(For authors of other ControlNet model extractions or fp16 model providers: now some models like "shuffle" needs the YAML file so that we know the outputs of ControlNet should pass a global average pooling before inject to SD U-Nets. Please add yaml files with same filenames to your renaming when delivering your processed models.)

Do not right click the filenames in HuggingFace website to download. Some users right clicked those HuggingFace HTML websites and saved those HTML pages as PTH/YAML files. They are not downloading correct PTH/YAML files. Instead, please click the small download arrow “↓” icon in HuggingFace to download.

New Features in A1111 ControlNet Extension 1.1

Perfect Support for All ControlNet 1.0/1.1 and T2I Adapter Models.

Now we have perfect support all available models and preprocessors, including perfect support for T2I style adapter and ControlNet 1.1 Shuffle. (Make sure that your YAML file names and model file names are same, see also YAML files in "stable-diffusion-webui\extensions\sd-webui-controlnet\models".)

Perfect Support for A1111 High-Res. Fix

Now if you turn on High-Res Fix in A1111, each controlnet will output two different control images: a small one and a large one. The small one is for your basic generating, and the big one is for your High-Res Fix generating. The two control images are computed by a smart algorithm called "super high-quality control image resampling". This is turned on by default, and you do not need to change any setting.

Perfect Support for A1111 I2I and Mask

Now ControlNet is extensively tested with A1111's different types of masks, including "Inpaint masked"/"Inpaint not masked", and "Whole picture"/"Only masked", and "Only masked padding"&"Mask blur". The resizing perfectly matches A1111's "Just resize"/"Crop and resize"/"Resize and fill". This means you can use ControlNet in nearly everywhere in your A1111 UI without difficulty!

Pixel Perfect Mode

Now if you turn on pixel-perfect mode, you do not need to set preprocessor (annotator) resolutions manually. The ControlNet will automatically compute the best annotator resolution for you so that each pixel perfectly matches Stable Diffusion.

User-Friendly GUI and Preprocessor Preview

We reorganized some previously confusing UI like "canvas width/height for new canvas" and it is in the 📝 button now. Now the preview GUI is controlled by the "allow preview" option and the trigger button 💥. The preview image size is better than before, and you do not need to scroll up and down - your a1111 GUI will not be messed up anymore!

Bug fix of Previous Guess Mode

One well known BUG of previous A1111 ControlNet Extension 1.0 is that if you use guess mode in one control unit in multiple ControlNets, all ControlNets will become Guess Mode - users cannot separately turn on/off guess mode for each ControlNets independently. Now we fixed this problem and each ControlNet's guess mode can be controlled independently.

Update from ControlNet 1.0 to 1.1

If you are a previous user of ControlNet 1.0, you may:

If you are not sure, you can back up and remove the folder "stable-diffusion-webui\extensions\sd-webui-controlnet", and then start from the step 1 in the above Install section.
Or you can start from the step 6 in the above Install section.

Previous Models

Big Models: https://huggingface.co/lllyasviel/ControlNet/tree/main/models

Small Models: https://huggingface.co/webui/ControlNet-modules-safetensors

You can still use all previous models in the previous ControlNet 1.0. Now, the previous "depth" is now called "depth_midas", the previous "normal" is called "normal_midas", the previous "hed" is called "softedge_hed". And starting from 1.1, all line maps, edge maps, lineart maps, boundary maps will have black background and white lines.

Usage

Open "txt2img" or "img2img" tab, write your prompts.
Press "Refresh models" and select the model you want to use. (If nothing appears, try reload/restart the webui)
Upload your image and select preprocessor, done.

Examples

Source	Input	Output
(no preprocessor)
(no preprocessor)

T2I-Adapter Support

(From TencentARC/T2I-Adapter)

T2I-Adapter is a small network that can provide additional guidance for pre-trained text-to-image models.

To use T2I-Adapter models:

Download files from https://huggingface.co/TencentARC/T2I-Adapter
Copy corresponding config file and rename it to the same name as the model - see list below.
It's better to use a slightly lower strength (t) when generating images with sketch model, such as 0.6-0.8. (ref: ldm/models/diffusion/plms.py)

Adapter	Config
t2iadapter_canny_sd14v1.pth	sketch_adapter_v14.yaml
t2iadapter_sketch_sd14v1.pth	sketch_adapter_v14.yaml
t2iadapter_seg_sd14v1.pth	image_adapter_v14.yaml
t2iadapter_keypose_sd14v1.pth	image_adapter_v14.yaml
t2iadapter_openpose_sd14v1.pth	image_adapter_v14.yaml
t2iadapter_color_sd14v1.pth	t2iadapter_color_sd14v1.yaml
t2iadapter_style_sd14v1.pth	t2iadapter_style_sd14v1.yaml

Note:

This implement is experimental, result may differ from original repo.
Some adapters may have mapping deviations (see issue lllyasviel/ControlNet#255)

Adapter Examples

Source	Input	Output
(no preprocessor)
(no preprocessor)
(no preprocessor)
(no preprocessor)

	(clip, non-image)

Examples by catboxanon, no tweaking or cherrypicking. (Color Guidance)

Image	Disabled	Enabled

Minimum Requirements

(Windows) (NVIDIA: Ampere) 4gb - with --xformers enabled, and Low VRAM mode ticked in the UI, goes up to 768x832

Guess Mode (Non-Prompt Mode, Experimental)

Guess Mode is CFG Based ControlNet + Exponential decay in weighting.

See issue Mikubill#236 for more details.

Original introduction from controlnet:

The "guess mode" (or called non-prompt mode) will completely unleash all the power of the very powerful ControlNet encoder.

In this mode, you can just remove all prompts, and then the ControlNet encoder will recognize the content of the input control map, like depth map, edge map, scribbles, etc.

This mode is very suitable for comparing different methods to control stable diffusion because the non-prompted generating task is significantly more difficult than prompted task. In this mode, different methods' performance will be very salient.

For this mode, we recommend to use 50 steps and guidance scale between 3 and 5.

Multi-ControlNet / Joint Conditioning (Experimental)

This option allows multiple ControlNet inputs for a single generation. To enable this option, change Multi ControlNet: Max models amount (requires restart) in the settings. Note that you will need to restart the WebUI for changes to take effect.

Guess Mode will apply to all ControlNet if any of them are enabled.

Source A	Source B	Output

Weight and Guidance Strength/Start/End

Weight is the weight of the controlnet "influence". It's analogous to prompt attention/emphasis. E.g. (myprompt: 1.2). Technically, it's the factor by which to multiply the ControlNet outputs before merging them with original SD Unet.

Guidance Start/End is the percentage of total steps the controlnet applies (guidance strength = guidance end). It's analogous to prompt editing/shifting. E.g. [myprompt::0.8] (It applies from the beginning until 80% of total steps)

API/Script Access

This extension can accept txt2img or img2img tasks via API or external extension call. Note that you may need to enable Allow other scripts to control this extension in settings for external calls.

To use the API: start WebUI with argument --api and go to http://webui-address/docs for documents or checkout examples.

To use external call: Checkout Wiki

Command Line Arguments

This extension adds these command line arguments to the webui:

    --controlnet-dir <path to directory with controlnet models>                                ADD a controlnet models directory
    --controlnet-annotator-models-path <path to directory with annotator model directories>    SET the directory for annotator models
    --no-half-controlnet                                                                       load controlnet models in full precision

MacOS Support

Tested with pytorch nightly: Mikubill#143 (comment)

To use this extension with mps and normal pytorch, currently you may need to start WebUI with --no-half.

Example: Visual-ChatGPT (by API)

Quick start:

# Run WebUI in API mode
python launch.py --api --xformers

# Install/Upgrade transformers
pip install -U transformers

# Install deps
pip install langchain==0.0.101 openai 

# Run exmaple
python example/chatgpt.py

Limits

Dragging large file on the Web UI may freeze the entire page. It is better to use the upload file option instead.
Just like WebUI's hijack, we used some interpolate to accept arbitrary size configure (see scripts/cldm.py)

yj7082126/sd-webui-controlnet