Is the provided weights based on the pre-trained work on Howto100M dataset?

Question

Is the provided weights based on the pre-trained work on Howto100M dataset?

lokeaichirou opened this issue 4 years ago · 2 comments

Hi, @ArrowLuo , many thanks for your previous replies, very helpful.
May I ask is the provided weights based on the pre-trained work on Howto100M dataset? When I do the video captioning downstream task, to get better evaluation results, do I need to fine-tune the model weights by further training it on YOUCOOKII ? as in the main_caption_youcook, using train function. Because when I evaluated based on provided weights, the scores for captioning is very low, instead of high score in paper.
Secondly, since I find youcookii_videos_features.pickle comprises S3D features for 1905 videos(nearly all videos), to evaluate it, is it better to do train-test split on it and just fine-tuning on training set while validating and testing on other portions?

Answer 1 · 2021-04-30T14:13:12.000Z

For your first question, the answer is YES. The finetune is necessary for better performance.
For the second question, youcookii_videos_features.pickle contains all video features. The train and test split are assigned via youcookii_train.csv and youcookii_val.csv. We finetune on youcookii_train.csv and evaluate on youcookii_val.csv.

Answer 2 · 2021-04-30T15:03:48.000Z

For your first question, the answer is YES. The finetune is necessary for better performance.
For the second question, youcookii_videos_features.pickle contains all video features. The train and test split are assigned via youcookii_train.csv and youcookii_val.csv. We finetune on youcookii_train.csv and evaluate on youcookii_val.csv.

Thanks! very helpful.