Can we use this for realtime generation?
muhammadumair894 opened this issue · 2 comments
Training and Inference Hardware.
Specifically, the GPU with 8G
VRAM can generate up to 3 minutes of video in one inference.
Hi there,
First of all, splendid work! I really appreciate the generalizability of this model and am looking forward to the code release.
I have a question regarding the paper's mention that a single 8GB GPU can produce a 3-minute video. Is it possible to make this process real-time or to stream the generated video so it appears as a real-time conversation?
How much time one inference takes for 3 min generation?
Currently, motion generation is a long-term process (taking several seconds to several minutes). It cannot achieve real-time performance like vasa1. If the segments are too short, there will be a lack of smoothness between segments.