TaatiTeam/MotionAGFormer

How to get the joint point coordinates

Closed this issue · 4 comments

Hello, this works great. But I have a question, how to get the reconstructed joint point coordinates?

Hi @CapenZen
I'm not sure if I understand. Could you elaborate what you mean?

Sorry, maybe my expression was not very clear. What I mean is that for in-the-wild videos, can I get specific three-dimensional joint point coordinates of the human body similar to [1,2,3]?

@CapenZen As I explained in issue 4, we're dealing with an ill-posed problem. That is, a tall person away from the camera has the same 2D projection scale as as short person close to the camera. Therefore, similar to MotionBERT and LCN, the model output is (lambda × actual 3D pose) instead of directly outputting it. The way to compute lambda is explained in issue 4. Please read the image I attached there from the LCN paper.

So for the in-the-wild videos, when we look at the model output, there are two things that you can notice:

  1. The output is normalized and is in range [-1, 1]. It can be denormalized the same way I explained in issue #4.
  2. The output is multiplied by a factor lambda. Unfortunately it is impossible to find the value and multiply by 1/lambda to get the actual coordinates because of dealing with an ill-posed issue.

Note that other works like STCFormer or PoseFormerV2 do not consider this lambda and the model outputs the actual pose coordinates without any lambda. But that means the model considers that the person has the same distance to the camera as the training data, which does not happen on in-the-wild examples.

The MotionAGFormer trained on MPI-INF-3DHP also does not consider any lambda. So after the denormalization it is what the model estimates about the actual scale (which is biased by the distances it has seen in training data and is not perfect).

Thank you for your patient explanation. My understanding of this knowledge has deepened.