AIDC-AI/Ovis

Any plan about support video and audio?

Opened this issue · 1 comments

Ovis is really good. Could you please support video and audio?

Thank you for your positive feedback on Ovis.

It's common practice to extract multiple frames from a video to create a multi-image input. While Ovis1.6 is primarily trained on single-image samples, it also supports multi-image inputs. An example is available at: #25

On the other hand, we are currently working on incorporating video data into our training process and plan to enhance video processing capabilities in future versions.