A ComfyUI Node for adding BLIP in CLIPTextEncode

Announcement: BLIP is now officially integrated into CLIPTextEncode

Dependencies

Fairscale>=0.4.4 (NOT in ComfyUI)
Transformers==4.26.1 (already in ComfyUI)
Timm>=0.4.12 (already in ComfyUI)
Gitpython (already in ComfyUI)

Local Installation

Inside ComfyUI_windows_portable\python_embeded, run:

python.exe -m pip install fairscale

And, inside ComfyUI_windows_portable\ComfyUI\custom_nodes, run:

git clone https://github.com/paulo-coronado/comfy_clip_blip_node

Google Colab Installation

Add a cell with the following code:

!pip install fairscale
!cd custom_nodes && git clone https://github.com/paulo-coronado/comfy_clip_blip_node

How to use

Add the CLIPTextEncodeBLIP node;
Connect the node with an image and select a value for min_length and max_length;
Optional: if you want to embed the BLIP text in a prompt, use the keyword BLIP_TEXT (e.g. "a photo of BLIP_TEXT", medium shot, intricate details, highly detailed).

Acknowledgement

The implementation of CLIPTextEncodeBLIP relies on resources from BLIP, ALBEF, Huggingface Transformers, and timm. We thank the original authors for their open-sourcing.