Can we use this for realtime generation?

Question

Can we use this for realtime generation?

muhammadumair894 opened this issue 7 months ago · 2 comments

Training and Inference Hardware.
Specifically, the GPU with 8G
VRAM can generate up to 3 minutes of video in one inference.

Hi there,

First of all, splendid work! I really appreciate the generalizability of this model and am looking forward to the code release.

I have a question regarding the paper's mention that a single 8GB GPU can produce a 3-minute video. Is it possible to make this process real-time or to stream the generated video so it appears as a real-time conversation?

Answer 1 · 2024-05-27T15:03:24.000Z

How much time one inference takes for 3 min generation?

Answer 2 · 2024-07-30T14:16:13.000Z

Currently, motion generation is a long-term process (taking several seconds to several minutes). It cannot achieve real-time performance like vasa1. If the segments are too short, there will be a lack of smoothness between segments.