mlvlab/Flipped-VQA

Number of frames and its use in code and max_feats10 for video feature

Closed this issue · 1 comments

From paper we know that 10 frames from each videos are used for weight file generation. Is that the only places where num of image frames from videos are relevant or number of frames are used later as well in the code?
As I see in the code that we only select 10 tensors (controlled by arg. max_feats) from the clip-ViT weight file. Is the selection of max_feats as 10 due to 10 frames or its an independent decision?

Thank you for your interest in our work.
Yes, args.max_feats is only the place which adjusts the number of video frames.
You can control the number of frames by changing args.max_feats.