/sd-webui-neutral-prompt

Collision-free AND keywords for a1111 webui!

Primary LanguagePythonMIT LicenseMIT

Neutral Prompt

Neutral prompt is an a1111 webui extension that adds alternative composable diffusion keywords to the prompt language. It enhances the original implementation using more recent research.

Features

  • Perp-Neg orthogonal prompts, invoked using the AND_PERP keyword
  • saliency-aware noise blending, invoked using the AND_SALT keyword (credits to Magic Fusion for the algorithm used to determine SNB maps from epsilons)
  • semantic guidance top-k filtering, invoked using the AND_TOPK keyword (reference: https://arxiv.org/abs/2301.12247)
  • standard deviation based CFG rescaling (Reference: https://arxiv.org/abs/2305.08891, section 3.4)

Usage

Keyword AND_PERP

The AND_PERP keyword, standing for "PERPendicular AND", integrates the orthogonalizaton process described in the Perp-Neg paper. Essentially, AND_PERP allows for prompting concepts that highly overlap with regular prompts, by negating contradicting noise.

You could visualize it as such: if AND prompts are "greedy" (taking as much space as possible in the output), AND_PERP prompts are opposite, relinquishing control as soon as there is a disagreement in the generated output.

Keyword AND_SALT

Saliency-aware noise blending is made possible using the AND_SALT keyword, shorthand for "SALienT AND". In essence, AND_SALT monitors high noise activity during denoising and dominates any high-activation regions in the output.

Think of it as a territorial dispute: the noise generated by the AND prompts is one country, and the noise(s) generated by AND_SALT prompts represent neighbouring nations. They're all vying for the same land - whoever strikes the strongest at a given time and location claims it.

Keyword AND_TOPK

The AND_TOPK keyword refers to "TOP-K filtering". It keeps only the "k" highest activation latent pixels in the noise map and discards the rest. It works similarly to AND_SALT, except that the high-activation regions are simply added instead of replacing previous content.

Currently k is constantly 5% of all latent pixels, meaning 95% of the weakest latent pixel values at each step are discarded.

Top-k filtering is useful when you want to have a more targetted effect on the generated image. It should work best with smaller objects and details.

Examples

Using the AND_PERP Keyword

Here is an example to illustrate one use case of the AND_PREP keyword. Prompt:

beautiful castle landscape AND monster house castle :-1

This is an XY grid with prompt S/R AND, AND_PERP:

image

Key observations:

  • The AND_PERP images exhibit a higher dynamic range compared to the AND images.
  • Since the prompts have a lot of overlap, the AND images sometimes struggle to depict a castle. This isn't a problem for the AND_PERP images.
  • The AND images tend to lean towards a purple color, because this was the path of least resistance between the two opposing prompts during generation. In contrast, the AND_PERP images, free from this tug-of-war, present a clearer representation.

Using the AND_SALT Keyword

The AND_SALT keyword can be used to invoke saliency-aware noise blending. It spotlights and accentuates areas of high-activation in the output.

Consider this example prompt utilizing AND_SALT:

a vibrant rainforest with lush green foliage
AND_SALT the glimmering rays of a golden sunset piercing through the trees

In this case, the extension identifies and isolates the most salient regions in the noise of the sunset prompt. Then, the extension applies this salient noise to the noise of the rainforest prompt. Only the portions of the rainforest noise that coincide with the salient areas of the sunset noise are affected. These areas are replaced by noise from the sunset prompt.

This is an XY grid with prompt S/R AND_SALT, AND, AND_PERP:

xyz_grid-0008-1564977627-a vibrant rainforest with lush green foliage_AND_SALT the glimmering rays of a golden sunset piercing through the trees

Key observations:

  • AND_SALT behaves more diplomatically, enhancing areas where its impact makes the most sense and aligning with high activity regions in the output
  • AND gives equal weight to both prompts, creating a blended result
  • AND_PERP will find its way through anything not blocked by the regular prompt

Advanced Features

Nesting prompts

The extension supports nesting of all prompt keywords including AND, allowing greater flexibility and control over the final output. Here's an example of how these keywords can be combined:

magical tree forests, eternal city
AND_PERP [
    electrical pole voyage
    AND_SALT small nocturne companion
]
AND_SALT [
    electrical tornado
    AND_SALT electric arcs, bzzz, sparks
]

To generate the final noise from the diffusion model:

  1. The extension first processes the root AND prompts. In this case, it's just magical tree forests, eternal city
  2. It then processes the AND_SALT prompt small nocturne companion in the context of electrical pole voyage. This enhances salient features in the electrical pole voyage noise
  3. This new noise is orthogonalized with the noise from magical tree forests, eternal city, blending the details of the 'electrical pole voyage' into the main scene without creating conflicts
  4. The extension then turns to the second AND_SALT group. It processes electric arcs, bzzz, sparks in the context of electrical tornado, amplifying salient features in the electrical tornado noise
  5. The noise from this AND_SALT group is then combined with the noise of magical tree forests, eternal city. The final output retains the strongest features from both the electrical tornado (enhanced by 'electric arcs, bzzz, sparks') and the earlier 'magical tree forests, eternal city' scene influenced by the 'electrical pole voyage'

Each keyword can define a distinct denoising space within its square brackets [...]. Prompts inside it merge into a single noise map before further processing down the prompt tree.

While there's no strict limit on the depth of nesting, experimental evidence suggests that going beyond a depth of 2 is generally unnecessary. We're still exploring the added precision from deeper nesting. If you discover innovative ways of controlling the generations using nested prompts, please share in the discussions!

image

Known issues

  • The webui doesn't support composable diffusion via AND for samplers DDIM, PLMS, and UniPC. As the extension relies on composable diffusion, it will revert to the unmodified sampler implementation when these are used.

Special Mentions

Special thanks to these people for helping make this extension possible:

  • Ai-Casanova : for sharing mathematical knowledge, time, and conducting proof-testing to enhance the robustness of this extension