High-quality zero-shot lipsync pipeline built on LivePortrait

Question

High-quality zero-shot lipsync pipeline built on LivePortrait

mvoodarla opened this issue 3 months ago · 9 comments

Hey folks! My team has been exploring zero-shot lipsyncing for a bit and we think we've improved on MuseTalk's quality quite a bit by using LivePortrait to neutralize expression and CodeFormer to enhance. Here's an example.

short.mp4

We wrote a technical blog on it: https://www.sievedata.com/blog/sievesync-zero-shot-lipsync-api-developers

Hope to put out an OSS repo soon too :)

Anything we don't talk about in the blog that we should in our repo release?

Answer 1 · 2024-09-17T17:13:22.000Z

No Codeformer, no Stable diffusion, just Audio2Head and LivePortrait, so you wanna attach a price to this Open source software now?

This actually took me 6.5 minutes

final_video.mp4

Answer 2 · 2024-09-17T17:40:10.000Z

Just another example of FREE

a6ba35dc-fc58-4108-85fd-478bf88d1241.mp4

Answer 3 · 2024-09-18T14:10:45.000Z

It seems a good practical mix of MuseTalk and LivePortrait 👍 @mvoodarla Will it be open-sourced lately?

Answer 4 · 2024-09-19T00:23:05.000Z

Hey @ziyaad30, those generations look nice! While we plan to open source relevant parts of our code, the full system is tailored to our infrastructure and wouldn't be directly usable by most developers. Our blog details the steps to achieve this quality for those interested in replicating it.

We charge for the service to cover the significant GPU costs for inference with large Stable Diffusion models. Our pay-per-use model is more accessible than the upfront cost of purchasing hardware. We're committed to open sourcing more as we develop with open source models, but some costs will always remain due to GPU requirements.

We plan to release an OSS repo soon (see the bottom of the blog for details!).

Answer 5 · 2024-09-19T13:06:45.000Z

Hey @ziyaad30, those generations look nice! While we plan to open source relevant parts of our code, the full system is tailored to our infrastructure and wouldn't be directly usable by most developers. Our blog details the steps to achieve this quality for those interested in replicating it.

We charge for the service to cover the significant GPU costs for inference with large Stable Diffusion models. Our pay-per-use model is more accessible than the upfront cost of purchasing hardware. We're committed to open sourcing more as we develop with open source models, but some costs will always remain due to GPU requirements.

We plan to release an OSS repo soon (see the bottom of the blog for details!).

If you can do that and release it, so those who have the GPU/Power to run it, and who do not have access to pay, then it'll be very good.

Answer 6 · 2024-09-28T00:12:58.000Z

here it is! https://github.com/sieve-community/sievesync

Answer 7 · 2024-09-30T04:29:16.000Z

Etti _singh

Answer 8 · 2024-09-30T04:29:39.000Z

Etti _singh

Answer 9 · 2024-10-02T16:18:02.000Z

here it is! https://github.com/sieve-community/sievesync

@mvoodarla Thanks! Your model's performance is quite good. It seems that your model's main framework is still MuseTalk, I'm curious about how much impact the retargeting module has on the results. Could you provide some examples w/ and w/o retargeting to illustrate the difference?