niluthpol/multimodal_vtt

questions about MSVD

Closed this issue · 2 comments

The article mentions that "where they randomly chose 5 ground-truth sentences per video. We use the same setting when we compare with that approach".Does the training set, validation set and test set all take 5 sentences at random?
Not all sentences are used in training set, validation set and test set?

5 ground-truth sentences per video are used when compared with LJRV[24] (LJRV picked five ground truth descriptions per video). See Table 2 (Partition used by LJRV [24]).