This repository contains workflow nodes designed to add features to the InvokeAI text to image generator.
Currently this repo contains two nodes:
- Enhance Prompt Node -- Use an LLM to make your simple prompts fancy
- Describe Image Node -- Use an LLM to describe the contents of an uploaded image
The EnhancePrompt Invocation uses a local Ollama large language model server to enhance prompts by adding additional qualifiers to a prompt provided by the user. For example:
Input Prompt:
bears having a picnic
Enhanced Prompt:
Realistic, whimsical picnic scene featuring playful bears basking in a
sun-drenched meadow, surrounded by vibrant wildflowers and lush
greenery. Bears are adorned with colorful picnic blankets, hats, and
sunglasses, indulging in a feast of fresh berries, nuts, and
honey. The sky is clear and blue, casting a warm glow over the scene.
It does this by passing the input prompt to an Ollama LLM server running on the same server as InvokeAI with a prompt to "rewrite the input in a for suitable for a text-to-image-generator." The rewritten prompt can then be passed on to a Compel mode for positive or negative text conditioning.
Although the node will work with any Ollama-compatible LLM, I recommend using gnokit/improve-prompt, a small (2B parameter) model that was specifically trained for prompt enhancement. Using this model will help avoid out of memory errors on CUDA systems when Invoke and the Ollama server contend with each other for VRAM.
Alternatively, if you find yourself getting OOM errors, you can enable the "Offload From GPU" option, which will purge the LLM model from VRAM immediately after running it. The only drawback of this it that it will cause a small delay before image rendering starts.
The Verbosity menu has four levels:
- terse
- medium
- baroque
As its name implies, will disable prompt rewriting completely and pass through the prompt unchanged.
Using the recommended gnokit/improve-prompt
model, the remaining
verbosity settings will have these effects:
terse will produce a concise prompt that adheres more strictly to the input text. It may add backgrounds and styles if they were not specified in the input prompt.
medium produces a longer prompt that adds whimsical details, atmospheric descriptions, and other qualifiers.
baroque produces a long-winded prompt with florid, poetic, often grandiose language that is equally likely to confuse or inspire the downstream image generation. Be aware that the enhanced prompts created with this setting will sometimes exceed the token limit.
Because we are working with a small LLM, this node will add another source of variability to your images. You may wish to use a constant random seed to the denoising step in order to see variation originating from the prompt.
It is assumed that you already have an Ollama server up and running on the same system you run InvokeAI on.
-
Activate the InvokeAI virtual environment ("developer's console") using the
invoke.bat
script, or manually. -
Install the
ollama
andlangchain-community
modules:pip install ollama langchain-community
-
Install the
gnokit/improve-prompt
model:ollama pull gnokit/improve-prompt
-
Download this repo and copy the directory
enhance_prompt
and its contents into thenodes
directory of your InvokeAI root folder. It should look like this:INVOKEAI_ROOT ├── nodes ├── enhance_prompt ├── enhanceprompt.py ├── describeimage.py ├── common.py └── __init__.py
-
Restart InvokeAI to pick up the new node.
You'll now be able to search the workflow editor for an "Enhanced Prompt". Select the LLM model you wish to use, type in a simple prompt, and hook the node's Enhanced Prompt output to the Prompt input of a Prompt node as shown in the screencap above.
The DescribeImage Invocation uses a local Ollama large language model server to describe the contents of any image. You can use the output description as an input prompt, or modify it in some way.
In this screenshot, we've taken the picnicking bears image from the previous Enhance Prompt example, passed it through the Describe Image node, and then used the resulting description as an SDXL image generation prompt:
The image depicts a group of anthropomorphic bears dressed in clothing and accessories that resemble human summer
attire, gathered on a picnic blanket in a scenic outdoor setting. They are positioned as if sitting around a picnic,
with a basket of food, including sandwiches and fruit, placed in the center. The bears are various shades of brown
and have features such as ears, claws, and tails that distinguish them from humans. In the background, there is a
mountainous landscape under a clear blue sky. The scene conveys a sense of leisure and relaxation,
typical of a family picnic.
Only some Ollama models have computer vision abilities. The one I have used is llava. This LLM uses about 8 GB of VRAM, so I recommend to set the "Offload From Gpu" setting to True. This will load the LLM during image processing, and unload it before the next step in the workflow.
It is assumed that you already have an Ollama server up and running on the same system you run InvokeAI on.
-
Activate the InvokeAI virtual environment ("developer's console") using the
invoke.bat
script, or manually. -
Install the
ollama
andlangchain-community
modules:pip install ollama langchain-community
-
Install the
llava
model:ollama pull llava
-
Download this repo and copy the directory
enhance_prompt
and its contents into thenodes
directory of your InvokeAI root folder. It should look like this:INVOKEAI_ROOT ├── nodes ├── enhance_prompt ├── enhanceprompt.py ├── describeimage.py ├── common.py └── __init__.py
-
Restart InvokeAI to pick up the new node.
You'll now be able to search the workflow editor for the "Describe an Image" node. Select the LLM model you wish to use, upload an image file, and hook the node's Description output to a Prompt node, or whatever you desire.
Copyright (c) 2024 Lincoln Stein. See LICENSE for usage terms.