High-quality zero-shot lipsync pipeline built on LivePortrait
mvoodarla opened this issue ยท 9 comments
Hey folks! My team has been exploring zero-shot lipsyncing for a bit and we think we've improved on MuseTalk's quality quite a bit by using LivePortrait to neutralize expression and CodeFormer to enhance. Here's an example.
short.mp4
We wrote a technical blog on it: https://www.sievedata.com/blog/sievesync-zero-shot-lipsync-api-developers
Hope to put out an OSS repo soon too :)
Anything we don't talk about in the blog that we should in our repo release?
No Codeformer, no Stable diffusion, just Audio2Head and LivePortrait, so you wanna attach a price to this Open source software now?
This actually took me 6.5 minutes
final_video.mp4
Just another example of FREE
a6ba35dc-fc58-4108-85fd-478bf88d1241.mp4
It seems a good practical mix of MuseTalk and LivePortrait ๐ @mvoodarla Will it be open-sourced lately?
Hey @ziyaad30, those generations look nice! While we plan to open source relevant parts of our code, the full system is tailored to our infrastructure and wouldn't be directly usable by most developers. Our blog details the steps to achieve this quality for those interested in replicating it.
We charge for the service to cover the significant GPU costs for inference with large Stable Diffusion models. Our pay-per-use model is more accessible than the upfront cost of purchasing hardware. We're committed to open sourcing more as we develop with open source models, but some costs will always remain due to GPU requirements.
We plan to release an OSS repo soon (see the bottom of the blog for details!).
Hey @ziyaad30, those generations look nice! While we plan to open source relevant parts of our code, the full system is tailored to our infrastructure and wouldn't be directly usable by most developers. Our blog details the steps to achieve this quality for those interested in replicating it.
We charge for the service to cover the significant GPU costs for inference with large Stable Diffusion models. Our pay-per-use model is more accessible than the upfront cost of purchasing hardware. We're committed to open sourcing more as we develop with open source models, but some costs will always remain due to GPU requirements.
We plan to release an OSS repo soon (see the bottom of the blog for details!).
If you can do that and release it, so those who have the GPU/Power to run it, and who do not have access to pay, then it'll be very good.
here it is! https://github.com/sieve-community/sievesync
Etti _singh
Etti _singh
here it is! https://github.com/sieve-community/sievesync
@mvoodarla Thanks! Your model's performance is quite good. It seems that your model's main framework is still MuseTalk, I'm curious about how much impact the retargeting module has on the results. Could you provide some examples w/ and w/o retargeting to illustrate the difference?