yohanshin/WHAM

An Immobile person have the pelvis bones that move up/down

Closed this issue · 5 comments

Hi,

I noticed that in any video where the person don't move, in WHAM the pelvis is moving and it make the immobile person move down/up sometimes left/right.

This problem is with the pelvis bone, that can't recognize that the person is immobile and not moving.

A sample video is not required, any recorded video of someone not moving foot but only hands will make this bug appear.

Thank you.

Hi @polidox2

Thanks for pointing this out. I am not 100% sure about the issue you raised, but WHAM does not always provide perfect estimation (although it shows SOTA performance in benchmarks). I just posted a sample result of a golfer here. Is this qualitative result aligned with your question?

Thank you for your reply!
No actually the issue is much more serious than your golfer video.

Specially in long video 30s+

bandicam_2024-02-05_13-46-32-822.mp4

in this video, i am training with my legs but i am immobile, however in WHAM the pelvis is moving very far away and making very unnatural and far movement.

I also recoded my self just speaking with hand movement, but immobile foot however WHAM give this results:

bandicam.2024-02-07.08-39-59-857.mp4

i did speed up both videos so you can see how serious it is.

Thank you!

@polidox2

Hi, thanks for sharing your results. For the first case, it is because WHAM was not well-trained on those exercise motions and thus, its global motion prediction may be incorrect. For the second video, it's a bit surprising to see those foot sliding even with the static pose. I just searched a speech video with the standing person and I don't see any of foot sliding as below.

output.mp4

Did you visualize the global motion? If so, was the camera moving in your case?

@yohanshin I think @polidox2 was using the addon version I'm making for WHAM.
And after making some tests, I found that using the local parameters (or not the world ones) for translation and pose, and also removing the Z axis of the translation (because it "jitters" like it happens with 4d humans) it presented, at least to me much better and stable results.

here is a sample of the result I"m having (speeding up the timeline)

blender_QfeSAgDEXw.mp4

The issue is because of shrinking see #64