[RET] Embedding

Question

[RET] Embedding

pUmpKin-Co opened this issue a year ago · 3 comments

pUmpKin-Co commented a year ago

Hi~Thanks for your exciting work. It inspire me a lot.
There are some question after I reading your code and paper:

Any quantitative results on RET Embedding? (like a table or something)
I don't fully understand how the updates to the RET embedding are separate from the updates to the other embedding. I noticed that the grad in other embedding is masked off, but the parameter requires_grad should be False (eval) for all lm's at the beginning of training. In that case, shouldn't all embedding have no gradient?
The purpose of nomalizing the embedding?

I hope you can answer these questions. Thank you!

Answer 1 · 2023-08-23T11:58:01.000Z

My bad. Question 2 has been answerd by #6.

Answer 2 · 2023-08-23T13:41:28.000Z

Thanks for your kind words!

Any quantitative results on RET Embedding? (like a table or something)

The RET embedding is used for retrieval, so I think the retrieval recall@k results in the paper are relevant: Table 1 for VIST, Table 2 for VisDial, and Table 3 in the appendix on MS-COCO.

The purpose of nomalizing the embedding?

This is mostly just to ensure that it is the same magnitude as the other token embeddings, so it doesn't become too OOD. I didn't conduct ablations to check if this is necessary, however. It could be that it works similarly even if you don't normalize the RET embedding.

Hope that helps!

Answer 3 · 2023-08-24T04:52:23.000Z

Thanks for you kind response！