X-LANCE/AniTalker

Can we use this for realtime generation?

muhammadumair894 opened this issue · 2 comments

Training and Inference Hardware.
Specifically, the GPU with 8G
VRAM can generate up to 3 minutes of video in one inference.

Hi there,

First of all, splendid work! I really appreciate the generalizability of this model and am looking forward to the code release.

I have a question regarding the paper's mention that a single 8GB GPU can produce a 3-minute video. Is it possible to make this process real-time or to stream the generated video so it appears as a real-time conversation?

How much time one inference takes for 3 min generation?

Currently, motion generation is a long-term process (taking several seconds to several minutes). It cannot achieve real-time performance like vasa1. If the segments are too short, there will be a lack of smoothness between segments.