microsoft/UniVL

How should I set the value in youcookii_videos_features.pickle when fine-tuning with single transcript as input?

lokeaichirou opened this issue · 2 comments

Hi, @ArrowLuo . I propose to fine tune the model with single transcript as input, so I generate another .pickle for youcookii_videos_features whose numpy arrays are set to be 'nan' for all single elements in video's each ndarray, like array with shape of number_of_frames * 1024 will all elements are 'nan'. I found the progressive training loss is nan.
So I set them to be zero, do you agree with this modification?
May I ask how do you deal with it in the case of single text modal info as input?

@lokeaichirou I agree with your modification that set all video features as zero. We did the same thing in our experiments. Besides, you can set --max_frames with 1 to reduce the memory cost. Good luck~

@lokeaichirou I agree with your modification that set all video features as zero. We did the same thing in our experiments. Besides, you can set --max_frames with 1 to reduce the memory cost. Good luck~

Thanks!