PKU-YuanGroup/Video-LLaVA

Inference on more than 8 frames

Nihel01 opened this issue · 2 comments

Would it be possible to run inference (or even training?) using more than 8 frames from a video?

If it's posibble could you point us out in where to control this? I have found multiple configs for number of frames, but not sure if one parameter somewhere controls all of them. Thanks.

The main thing is to change the output of the video encoder. If your video encoder supports multiple frames, then feed it all to LLM.

Check this reply: #123 (comment)