Question about Pharse2idx
Closed this issue · 1 comments
superhero-7 commented
Hi, thanks for you awsome work! I have some question about the funciton Pharse2idx in the inference.py. I found that you get the object_positions before tokenize and then put it into compute_ca_loss :
object_positions = Pharse2idx(prompt, phrases)
...
# Encode Prompt
input_ids = tokenizer(
[prompt] * cfg.inference.batch_size,
padding="max_length",
truncation=True,
max_length=tokenizer.model_max_length,
return_tensors="pt",
)
...
# update latents with guidance
loss = compute_ca_loss(attn_map_integrated_mid, attn_map_integrated_up, bboxes=bboxes,
object_positions=object_positions) * cfg.inference.loss_scale
My question is what if some word is be tokenized as subwords? Then the corresponding object will be wrong, right?
silent-chen commented
Hi, thanks for your interest in our work. The answer to your question is "yes". I think a better way to identify the position of the phrase in the prompt is to first tokenize it and then find its position.