Retrieval dataset process approach

Question

Retrieval dataset process approach

Seolen opened this issue a year ago · 1 comments

Thanks for your impressive work, I have a question to evaluate video-text retrieval: In datasets such as MSVD and MSRVTT, each video is attached with multiple captions. How do you process this problem for retrieval?

Answer 1 · 2023-09-06T08:31:19.000Z

Yes, in training data, there are multiple corresponding captions for videos. When training, we do not process the problem and just fine-tune the models with VTC (video-text contrastive) and VTM (video-text matching) loss.

In testing data, there is only one caption for a video.