MKLab-ITI/visil

The way to split the dataset into relevant and irrelevant data

wch982167 opened this issue · 2 comments

Hi there, I am confused how to split the dataset into relevant and irrelevant data.
Take FIVR-200K dataset as an example, is it based on the topic or headline in event.json (e.g. relevant data have same topic)? Or is it based on annotation (ND,DS,CS,IS -> relevant, others ->irrelevant)?
If it is based on annotation, isn't that knowing ground truth before evaluation?
If it is based on topic/headline, then how do the other datasets(CC_WEB and EVVE) work?

Could someone help to explain it? Thanks a lot!

Hi @wch982167. Thanks for your questions!

We do not split the datasets into training/validation/test sets for the mentioned datasets. In particular, we train the network once using the VCDB dataset and then evaluate it on the other evaluation datasets, i.e., FIVR-200K, CC_WEB_VIDEO, and EVVE. Furthermore, the VCDB contains annotations for the segments of the videos in the dataset that are copies. Therefore, distinguishing between relevant and irrelevant videos is straightforward.

The only dataset we split into training/validation/test sets is the ActivityNet, following this work. In this case, we train the network with the training set and evaluate on the test set. Also, the video pairs annotated with the same class label are considered relevant, and the rest pairs annotated with a different label as irrelevant.

Thanks for your reply !!