silent-chen/layout-guidance

Question about Pharse2idx

Closed this issue · 1 comments

Hi, thanks for you awsome work! I have some question about the funciton Pharse2idx in the inference.py. I found that you get the object_positions before tokenize and then put it into compute_ca_loss :

object_positions = Pharse2idx(prompt, phrases)
...
   # Encode Prompt
    input_ids = tokenizer(
            [prompt] * cfg.inference.batch_size,
            padding="max_length",
            truncation=True,
            max_length=tokenizer.model_max_length,
            return_tensors="pt",
        )
...
    # update latents with guidance
    loss = compute_ca_loss(attn_map_integrated_mid, attn_map_integrated_up, bboxes=bboxes,
                           object_positions=object_positions) * cfg.inference.loss_scale

My question is what if some word is be tokenized as subwords? Then the corresponding object will be wrong, right?

Hi, thanks for your interest in our work. The answer to your question is "yes". I think a better way to identify the position of the phrase in the prompt is to first tokenize it and then find its position.