Any plan about support video and audio?
Opened this issue · 1 comments
thesby commented
Ovis is really good. Could you please support video and audio?
runninglsy commented
Thank you for your positive feedback on Ovis.
It's common practice to extract multiple frames from a video to create a multi-image input. While Ovis1.6 is primarily trained on single-image samples, it also supports multi-image inputs. An example is available at: #25
On the other hand, we are currently working on incorporating video data into our training process and plan to enhance video processing capabilities in future versions.