Any plan about support video and audio?

Question

Any plan about support video and audio?

Opened this issue 2 months ago · 1 comments

Ovis is really good. Could you please support video and audio?

Answer 1 · 2024-11-26T11:33:21.000Z

Thank you for your positive feedback on Ovis.

It's common practice to extract multiple frames from a video to create a multi-image input. While Ovis1.6 is primarily trained on single-image samples, it also supports multi-image inputs. An example is available at: #25

On the other hand, we are currently working on incorporating video data into our training process and plan to enhance video processing capabilities in future versions.