Question about Pharse2idx

Question

Question about Pharse2idx

Closed this issue a year ago · 1 comments

Hi, thanks for you awsome work! I have some question about the funciton Pharse2idx in the inference.py. I found that you get the object_positions before tokenize and then put it into compute_ca_loss :

object_positions = Pharse2idx(prompt, phrases)
...
   # Encode Prompt
    input_ids = tokenizer(
            [prompt] * cfg.inference.batch_size,
            padding="max_length",
            truncation=True,
            max_length=tokenizer.model_max_length,
            return_tensors="pt",
        )
...
    # update latents with guidance
    loss = compute_ca_loss(attn_map_integrated_mid, attn_map_integrated_up, bboxes=bboxes,
                           object_positions=object_positions) * cfg.inference.loss_scale

My question is what if some word is be tokenized as subwords? Then the corresponding object will be wrong, right?

Answer 1 · 2023-12-02T16:12:13.000Z

Hi, thanks for your interest in our work. The answer to your question is "yes". I think a better way to identify the position of the phrase in the prompt is to first tokenize it and then find its position.