ml5js/ml5-website-v02-docsify

BlazePose documentation of z coordinates

Opened this issue · 12 comments

Hi everyone! I'm working on a video tutorial about BodyPose in ml5.js 1.0. I discovered in the making of the video some improvements I think we could make to the documentation of how the 3D coordinates work with BlazePose.

In the tfjs-models documentation the units for keypoints3D are explained as follows:

For the keypoints3D, x, y and z represent absolute distance in meters in a 2 x 2 x 2 meter cubic space. The range for each axis goes from -1 to 1 (therefore 2m total delta). The z is always perpendicular to the xy plane that passes the center of the hip, so the coordinate for the hip center is (0, 0, 0).

We should probably include a simplified version of this in our documentation here:

Screenshot 2024-10-07 at 4 35 08 PM

I was also confused to find that the 2D keypoints array (as described in our docs) also includes a z value. I don't believe this is part of the original BlazePose data. It looks like this is the code where it is being added but the units appear to be different. @ziyuan-linn, do you know offhand what is happening here? Is there some code I'm missing which is trying to change the real world "meters" range to pixel units?

This is what I see in the console:

keypoints:
Screenshot 2024-10-07 at 4 38 26 PM

keypoints3D:
Screenshot 2024-10-07 at 4 38 33 PM

And now under the nose property:
Screenshot 2024-10-07 at 4 40 26 PM

(Interesting to note that the confidence score is different for keypoints3D!)

I just ran BlazePose and here is a raw unprocessed output.

[
    {
        "score": 0.995323121547699,
        "keypoints": [
            {
                "x": 245.63711038340324,
                "y": 294.79695594356946,
                "z": -689535.245693554,
                "score": 0.9987707333798664,
                "name": "nose"
            },
            {
                "x": 270.6604500325918,
                "y": 252.8618703269841,
                "z": -639394.1690627289,
                "score": 0.9983254733819559,
                "name": "left_eye_inner"
            },
            // ...
        ],
        "keypoints3D": [
            {
                "x": 0.01237955486137293,
                "y": -0.587451014244643,
                "z": -0.2591571422454245,
                "score": 0.9982515152476616,
                "name": "nose"
            },
            {
                "x": 0.026530376499100554,
                "y": -0.6236724986839615,
                "z": -0.24141936558530744,
                "score": 0.9973494677709942,
                "name": "left_eye_inner"
            },
            // ...
        ]
    }
]

Looks like the z values are present for the 2d keypoints. I have no idea what it represents, and it also does not seem to be in the documentation. If everyone agrees, I think we can just take that value out.

I think the named keypoints just copy the x, y, z, and confidence values from the keypoints array. Should we also add the 3d values to the named keypoints?

Thank you for looking into this @ziyuan-linn! Let's do the following:

  • Remove the z value in the keypoints array.

I was about to say let's add only the z value from keypoints3D to the named values, but it might make sense for us to provide the full xyz, what about the following:

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  keypoint3D: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

Or is this overdoing it and making it super complicated? @MOQN I'd be curious for your thoughts?

On another note, I'm trying to figure out why the tfjs docs list 4 extra points for BlazePose that don't actually show up in the model. @ziyuan-linn have you run across this in your research at all?

34: forehead
35: leftThumb
36: leftHand
37: rightThumb
38: rightHand \

@shiffman I also have no idea what those keypoints are. The tfjs documentation can sometimes be puzzling. The BlazePose model is trained with 33 keypoints as the Google MediaPipe Model Card suggests. I think it should be safe to ignore them.

I will reply with my thoughts about the API in the ml5-next-gen thread.

Actually just looking at the model card I think the 2d z value is supposed to be:

Z coordinate is measured in "image pixels" like the X and Y screen coordinates and represents the distance relative to the plane of the subject's hips, which is the origin of the Z axis. Negative values are between the hips and the camera; positive values are behind the hips. Z coordinate scale is similar with X, Y scales but has dierent nature as obtained not via human annotation, by ing synthetic data (GHUM model) to the 2D annotation. Note, that Z is not metric but up to scale.

However, a value like -437589 is nowhere near accurate. I think removing it for now might be the best choice.

MOQN commented

Thank you for all of these findings and thoughtful discussion. I completely agree with removing the Z value!

MOQN commented
nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  keypoint3D: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

Or is this overdoing it and making it super complicated? @MOQN I'd be curious for your thoughts?

@shiffman, I believe it's a great suggestion. Very intuitive! (Edit: I wrote it too quicky suggesting 3d and didn't realize the key begins with a number, haha.) Alternatively we could use pos3D, position, position3D, coords, coords3D or depth instead of keypoint3D.

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  position: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

or

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  x3D: 0.05988978072436527,
  y3D: -0.5489126977664187,
  z3D: -0.26418375968933105
}

keypoints and keypoints3D can be used only for the array names to get the entire position data.

keypoints: [{ x, y, confidence, name }, ...],
keypoints3D: [{ x, y, z, confidence, name }, ...],

[
  {
    box: { width, height, xMax, xMin, yMax, yMin },
    id: 1,
    keypoints: [{ x, y, confidence, name }, ...],
    keypoints3D: [{ x, y, z, confidence, name }, ...],
    left_ankle: { x, y, z, confidence },
    ...
    confidence: 0.28,
  },
  ...
];

These are great suggestions! I'd love to hear everyone's feedback during the meeting today!

Hello web team! Just noting this has now been incorporated so we can update the documentation! See ml5js/ml5-next-gen#215

@shiffman @MOQN @ziyuan-linn Hi team, @leey611 and I are currently working on the documentations to address this issue. While the named keypoints update is functioning well, we’ve discovered that the keypoints array is still including "wrong" z values. We haven’t addressed it in the ml5@1.1.0 update, have we?

Ah i just checked and you are right! Let me make a quick fix for this and we can do a 1.1.1 release!

Wondering if this issue has been completed and can be closed? For reference, you can see my BlazePose 3D tutorial: https://thecodingtrain.com/tracks/ml5js-beginners-guide/ml5/7-bodypose/blazePose

Wondering if this issue has been completed and can be closed? For reference, you can see my BlazePose 3D tutorial: https://thecodingtrain.com/tracks/ml5js-beginners-guide/ml5/7-bodypose/blazePose

I think so too! Let me double check 👍