/kandinsky-for-automatic1111

Editing to jive with my existing kandinsky installation

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

Kandinsky 2.1 For Automatic1111 Extension

Adds a script that runs the Kandinsky 2.1 model.

Disclaimer

ALPHA VERSION NOT PRODUCTION-READY

!!Note!! Progress bar not supported yet.

Examples

The following are non cherry-picked examples, with various settings and resolutions.

center image

Prompt: sky, daylight, realistic, high quality, in focus, 16k, HQ
Steps: 64
Sampler: Default
CFG scale: 7
Seed: 3479955
Size: 1024x1024
Inference Steps: 128

center image

Prompt: As the sun sets, les arbres whisper, mientras el río serpentea gracefully, отражая прекрасные colors, majestic mountains stand tall, evoking tranquillité et harmonie, 空中舞动着美丽的蝴蝶, 空と地球の神秘なつながり, रंगबिरंगी वस्तुएं। (from chatgpt)
In English: As the sun sets, the trees whisper, while the river gracefully meanders, reflecting beautiful colors, majestic mountains stand tall, evoking tranquility and harmony, butterflies dance in the air, the mysterious connection between sky and earth, colorful objects.
Steps: 64
Sampler: Default
CFG scale: 7
Seed: 3479955
Size: 768x768
Inference Steps: 128

center image

Prompt: cat, realistic, high quality, 4k
Steps: 64
Sampler: Default
CFG scale: 7
Seed: 3479955
Size: 1024x1024
Inference Steps: 128

center image

Prompt: spaceship, retro, realistic, high quality, 4k
Steps: 64
Sampler: Default
CFG scale: 7
Seed: 3479955
Size: 512x512
Inference Steps: 128

center image

Prompt: cyberpunk city, distopian, high quality, 4k
Steps: 64
Sampler: Default
CFG scale: 3
Seed: 3479955
Size: 768x768
Inference Steps: 128

Image Mixing

Combine images and/or prompts together. Can be used for style transfer, and combining a background with a subject.

Prompt: cat, high quality, 4k
Steps: 64
Sampler: Default
CFG scale: 7
Seed: 3479955494
Size: 1536x768
Inference Steps: 128

Mixed with:

center image

Result:

center image

How To Use

  1. Select "Kandinsky" in the scripts section
  2. Set "Prior Inference Steps". Increasing the value improves the results, but it reaches a plateau at around 128. Beyond that, the image may change, but the quality remains consistent.
  3. The model will start downloading automatically, if needed.

Image Mixing

Prompt + Image

  1. In text2img set the prompt
  2. In the extra image field in the script section, set the image
  3. Set the "Interpolate Image 1 Strength" to the desired amount of the image generated by the prompt
  4. Set the "Interpolate Image 2 Strength" to the desired amount of the image in the script section

Image + Image

  1. In img2img set an image
  2. In the extra image field in the script section, set the image
  3. Set the "Interpolate Image 1 Strength" to the desired amount of the image generated by the prompt
  4. Set the "Interpolate Image 2 Strength" to the desired amount of the image in the script section

Notes

  • Prompt size is 512 tokens
  • Seeds are somewhat consistent across different resolutions
  • Changing sampling steps keeps the same image, while changing quality
  • The seed is not as important as the prompt, the subjects/compositions across seeds are very similar
  • It is very easy to "overcook" images with prompts, if this happens remove keywords or reduce CFG scale
    • Negative prompts aren't needed, so "low quality, bad quality..." can be ommited
    • Short positive prompts are good, too many keywords confuse the ai
  • Ignore the warning "Pipelines loaded with torch_dtype=torch.float16 cannot run with cpu device..." the model is being moved to save vram

Features

  • Text to image
  • Batching
  • Img2img
  • Inpainting
  • Image mixing
  • vram optimizations (16 bit float and attention slicing)

Supported Settings

  • prompt
  • negative prompt
  • cfg scale
  • seed
  • width
  • height
  • sampling steps
  • denoising strength
  • batch count
  • batch size (only first image's seed can be replicated)
  • img2img image, and inpaint
  • inpaint at full resolution (needs fixing)

Any other settings such as seed variations, will have no effect on generated images.

Known Bugs

  • Ram memory leak (still investigating)

Limitations

  • Uses the diffusers image generation pipeline to run Kandinsky (Only "kandinsky-community/kandinsky-2-1" is supported on Hugging Face, so no custom models)
  • No controlnet
  • No training
  • No support for other extensions like ultimate-upscale, tiled diffusion, etc.
  • No progress bar in GUI
  • No choice for samplers
  • Stable diffusion model and vae are not unloaded from ram, resulting in ~15gb ram usage
  • Not possible to replicate seed in batches
  • Strength of words in the prompt can't be set
  • Other automatic1111 features such as seed variations, hires fix, tiling, etc. are not supported
  • Can't be run with other automatic1111 scripts