MFaceTech/AnimateAnyone-SVD

training

Opened this issue · 4 comments

Do you only train controlnet?

Yes, we only train ControlNet about v1 pretrained checkpoint, and you also train the temporal layers with a sufficient amount of data if needed.

@MFaceTech Thanks for your reply. I notice you use first frame as ref image and condition image latents, which cannot fit the long-range video inference. Have you test long-video? Why not just random choose one frame from the whole video as the ref image?

@jiangzhengkai This is a good observation, and we have also noticed that using the first frame as a reference image is not favorable for long-range video inference. This issue has been addressed in v1.1, and we will revise the code and release the models soon.

@MFaceTech can you introduce how to enable long-range video inference briefly in technically?