DAMO-NLP-SG/VideoLLaMA2

Can videollama2 continue finetuning on my own dataset using 32 frames?

zhengrongz opened this issue · 2 comments

Hi! Thanks for your excellent work!
I wonder know whether I can use 32 frames per video to finetune model on my own dataset?
If true, do I just need to change the number of sampled frames in constant?
Looking forward to your reply!

Yes, I believe it is fine to do so. In our internal evaluations, we found that our video models can generalize well to longer input (i.e., more input frames), and they usually perform better given the longer input.

You can specify this argument explicitly in your own script to support the training with more video frames.

@lixin4ever OK thank you!