Body movements + jittery + artifacts issues

Question

Body movements + jittery + artifacts issues

A-2-H opened this issue 7 months ago · 2 comments

Thank you for your incredible work on this!
I was testing the program and it's settings to get the best results out of it and I found some things and wanted to share.
Here is the example video showing the comparison between fi-steps and acceleration:
https://youtu.be/YGqIvwcN3jQ

I tested script with acceleration and without it. Here's my thoughts:

You can see that without acceleration there is less artifacts going on in the backgroud in comparison to accelerated video with no interpolation.
Videos without acceleration and with acceleration but with 1 fi-step - the facial animation is more detailed, more accurate.
The more interpolation (fi-steps) the less detailed facial animation we will get. With more fi-steps there is less body jittery but we are loosing facial animation.
Body movement does not move like in the input video. I think it is because it is generated based on a 3d face position.
Body have an issue with bad consistancy due to lack of reference in 3D space (that's my guess).
I think the best solution for the body to not "jump" every few seconds is to change the way it is generated (now it's based on a head position). It should have it's own 3D reference - it could be just torso with neck and head position. Than the face position would be in place with the neck/spine of a body to match up the video input. I think body starts to jump every few seconds because it has different position than video input and that's make it harder to be stable and consistant.
If the body 3d reference/controlnet implementation is not possible, maybe we could have different interpolation for face animation and different for body, so we could set interpolation for body as 3 fi-steps and for face leave it as 1 fi-steps. This way facial animation would be as in reference and face enhancment will do it's job here. But still the position of the body needs to match up the video input.
Faces are usually very deformed so I've used other programs to enhance it and I guess it works great so maybe we could implement it aswell. Faces would be based on image input. First it needs to reconstruct the face for example with simswap model and than enhance it with gfpgan 1.4 or other model. For me it was simswap 256 with gfpgan 1.4 = I've got best results for this character but we could apply more models to choose from in the scripts - that would be good pipline i guess

Answer 1 · 2024-05-31T07:46:47.000Z

Thank you for your interest in our work! We appreciate your viewpoints.

We utilize optical flow to create intermediary frames for reducing AniPortrait's time consumption, and the artifacts are caused randomly by AniPortrait. Therefore, relationship between acceleration strategy and background artifact is not significant.
Frame interpolation will produce a video smooth effect, which decreases facial motion details. Consequently, we suggest keeping the fi-step below 3.
In order to solve body flickering, you can add shoulders keypoints in pose image while training the model. But you also need to consider how to generate body motion during inference.
Face enhance is a custom option, and we will include this function in the inference pipeline in the future version if possible.

Answer 2 · 2024-05-31T08:33:50.000Z

Thank you for you quick respond!
Number 3 is a very interesting subject that I couldn't find any inormation about that in the training process, so can you please explain it even more?

how to add shoulders keypoints to my dataset is there more advanced description about how to trian model?
Will the training on 2TB dataset with body keypoints reduce the flickering in every position of a body or I will still need to be very gentle with body motion during the inference?
As in my example video the shoulder movements in video input was barely visible but still generate flickers - is it because of the pretrained model itself that it is not train that much on shoulder keypoints?
What exactly do you mean by "consider how to generate body motion"? You mean torso movement or the head movement to be less "dynamic" and more subtle? Or specific movements?

Very courious about that. Because if this is significate for this flickering, than I'm considering the model training process to reduce it.