Should the last_embedding_idx = caption - 2 ?

Line 184 in 2652cc6

    
           last_embedding_idx = caption_len - 1  # -1 to retrieve the token before the eos token

Hello kohjingyu, thanks for your great work!

I'm a bit confused about the variable last_embedding_idx in models.py. The input caption seems like [..., [RET], [EOS]], therefore the caption_len - 1 refers to the index of [EOS], thus the last_hidden_state[i, last_embedding_idx[i], :] indicates the output hidden state of token [EOS] which is used to retrieve images. Is there anything wrong? Please point out if I misunderstood the intent.

Best regards.

Thanks for your interest! This is because the OPT models and tokenizer do not add the [EOS] tokens by default. Hence, the last token is [RET] during retrieval training, which his why it's caption_len - 1. Hence the comment there is a bit misleading, which I apologize for!

Thanks for your interest! This is because the OPT models and tokenizer do not add the [EOS] tokens by default. Hence, the last token is [RET] during retrieval training, which his why it's caption_len - 1. Hence the comment there is a bit misleading, which I apologize for!

I see, thanks for your kind reply.