Is the provided weights based on the pre-trained work on Howto100M dataset?
lokeaichirou opened this issue · 2 comments
Hi, @ArrowLuo , many thanks for your previous replies, very helpful.
May I ask is the provided weights based on the pre-trained work on Howto100M dataset? When I do the video captioning downstream task, to get better evaluation results, do I need to fine-tune the model weights by further training it on YOUCOOKII ? as in the main_caption_youcook, using train function. Because when I evaluated based on provided weights, the scores for captioning is very low, instead of high score in paper.
Secondly, since I find youcookii_videos_features.pickle comprises S3D features for 1905 videos(nearly all videos), to evaluate it, is it better to do train-test split on it and just fine-tuning on training set while validating and testing on other portions?
For your first question, the answer is YES. The finetune is necessary for better performance.
For the second question, youcookii_videos_features.pickle contains all video features. The train and test split are assigned via youcookii_train.csv and youcookii_val.csv. We finetune on youcookii_train.csv and evaluate on youcookii_val.csv.
For your first question, the answer is YES. The finetune is necessary for better performance.
For the second question, youcookii_videos_features.pickle contains all video features. The train and test split are assigned via youcookii_train.csv and youcookii_val.csv. We finetune on youcookii_train.csv and evaluate on youcookii_val.csv.
Thanks! very helpful.