daxcay/ComfyUI-JDCN

Node Request: caption dataset using reference subset image folder

rafstahelin opened this issue · 11 comments

I am trying to setup a workflow that will allow me to selectively insert a caption token from a subset group of image capions to the full dataset, but it should only concatenate the corresponding dataset txt files that match the subset filenames.

Very good use case for training images for a SD model to a concept. Would this be easily achievable with some sort of JDCN variant node?

Your node request will be beneficial for everyone, but to ensure its utility, please provide an example. In the example, include the input connections the node should have, how the dataset looks, the process (an example of what the result will look like), and the output connections the node should have. If the example you wish to provide is confidential, please indicate so.

You can contact me on: https://discord.com/invite/aGQmh62T

Objective:

The aim of this process is to append a specific "concept" to the caption text files of a given "dataset" in batches. The determination of which caption files receive this concept addition is based on a selection of images made by the user. By copying a group of images from the dataset, the user enables the system to generate a list of captions to be modified.

Purpose:

This process streamlines the integration of semantic concepts into training captions. Currently, there are no existing tools capable of performing this task. Leveraging the advanced vision models within Comfyui, such as VLM and LMVision nodes, we can efficiently and accurately caption dataset images. However, further customization is necessary to refine these captions. Grouping captions allows for the targeted addition of specific concepts to a set of images. Users can select images by copying them into a "subset" folder to assess their relevance to the desired concept. Subsequently, the user provides the subset path to the system and selects the position at which the concept should be inserted into the caption.

Inputs:

"subset" directory: Contains caption image files in formats such as *.jpg, *.png, or *.webp.
"dataset" directory: Holds caption text files in *.txt format.
Concept: The text to be appended to the captions.

Actions:

Append the concept to the beginning of the caption.
Append the concept after the first clause (following the first comma) in the caption.
Append the concept after the second clause (following the second comma) in the caption.

Example of subset image files:

filename: filename1.jpg, filename2.jpg, filename3.jpg,
file type: images

Example of dataset txt files:

filename: filename1.txt, filename2.txt, filename3.txt, filename4.txt
filename base matches the subset images filename base
file type: text files
content: caption1, caption 2, caption 3...
example of a caption: "photo of young blonde woman with curly messy hair looking down pensively, white sleeveless top, blue jeans, hands in pockets, freckles, soft natural lighting, dreamy contemplative mood, light grey background, medium shot, slim build, casual style, tousled hair, fair skin, mid 20s, serious expression, minimalist composition"

Result of action 1:

content: concept, caption1(dataset), caption2(dataset), caption3(dataset)...
example of caption: "sunny-lighting, photo of young blonde woman with curly messy hair looking down pensively, white sleeveless top, blue jeans, hands in pockets, freckles, soft natural lighting, dreamy contemplative mood, light grey background, medium shot, slim build, casual style, tousled hair, fair skin, mid 20s, serious expression, minimalist composition"

Result of action 2:

content: caption 1(dataset), concept, caption2(dataset), caption3(dataset)...
example of caption: "photo of young blonde woman with curly messy hair looking down pensively, sunny-lighting, white sleeveless top, blue jeans, hands in pockets, freckles, soft natural lighting, dreamy contemplative mood, light grey background, medium shot, slim build, casual style, tousled hair, fair skin, mid 20s, serious expression, minimalist composition"

Result of action 3:

content: caption 1(dataset), caption2(dataset), concept, caption3(dataset)...
example of caption: "photo of young blonde woman with curly messy hair looking down pensively, white sleeveless top, sunny-lighting, blue jeans, hands in pockets, freckles, soft natural lighting, dreamy contemplative mood, light grey background, medium shot, slim build, casual style, tousled hair, fair skin, mid 20s, serious expression, minimalist composition"

Hi, the current nodes in JDCN can do the above task with mix of other few outside nodes.

Use the workflow below.

Update JDCN

Caption.json

Please confirm if it works ?

Hey, I've tested it out.

  1. the saved file is being saved without the extension. I suppose that is for testing purposes, to avoid overwriting
  2. I am not sure how to tell the wf which position to use, position 1, 2, or 3?
  3. How do you batch all the files in the reference image folder. The idea is for all the images in the reference folder to dictate what text files to be given the concept, one by one.

captioner

Hey, I've tested it out.

  1. the saved file is being saved without the extension. I suppose that is for testing purposes, to avoid overwriting
  2. I am not sure how to tell the wf which position to use, position 1, 2, or 3?
  3. How do you batch all the files in the reference image folder. The idea is for all the images in the reference folder to dictate what text files to be given the concept, one by one.

captioner

Hi, Please update JDCN. Name saving issue was fixed yesterday.

Also to go to next image change this node from fixed to increment

image

Ok, seems to be working partially.

  1. Instead of overwriting the file it is concatenating the new caption to the existing caption. So getting the existing caption twice, once without the concept, and again once with the concept

  2. I am confused about how to select the position for the caption. Should I simply unplug or mute the savers that is not desired? In short, how do I set the position of the concept to be inserted? Say in position 1, 2, or 3? Sorry, not sure if that's on me

Hi,
You are not updating the JDCN.

Please use ComfyUI Manager to Update Or Delete My Node and Install it again.

This controls where to place the caption

image

This is the new TXT File Saver I am talking about.

image

Please Update from your side.
It will work.

Hi
I deleted the extension and reinstalled. So i can see the new saver modes. I am setting to overwrite.
Ok seems to be working. I am muting the saver position i dont want. Got it.

workflow (3)
image

But I am getting an error, not sure why.

  1. Do i need to always set Select File Number to See 1?
  2. Batch Count on floating panel, should be set to number of reference images, right?
  3. JDCN Anyfileselector--i am not sure if fixed or increment. I am getting error but not sure why.

I am testing with these files. Dropbox link:
https://www.dropbox.com/scl/fo/g1pf5xdqrb75rlhfqn6z6/AODJhblaJZQCc0C5YsgpTaY?rlkey=j1rdhfhqvik3zo2lmxay3dr60&dl=0
When i run it, only 2/4 captions get the concept on position 2 (according to the file saver that I have set, others are muted). Am I not supposed to mute the savers?

Hi,

It will be better you connect me on discord so we can discuss it on a call ?

Then i will post update here.

https://discord.com/invite/aGQmh62T

The functionality actually has been magnificiently add as a node by JDCN. I am really thankful for this. Here's what it looks like

The basic purpose of this, is to caption certain words, concepts for training captions with control over what position in the text file. So if you write a digit after the string in the TagManipulatobyNames node, it will insert in the correct position. Well, I hope JDCN will add a Readme for it...

image