akanazawa/hmr

Predicted parameters of the weak perspective projection

longbowzhang opened this issue · 3 comments

Hi, @akanazawa sorry to bother you.
I am confused w.r.t the predicted parameters of the weak perspective projection.

  1. As you mentioned that scale s that HMR recovers is essentially focal_length/z, but the following line

    tz = flength / (0.5 * img_size * cam_s)
    suggests that 0.5 * img_size comes into play, why?

  2. This line code

    vert_shifted = verts + trans
    suggests that verts and trans, which is trans = np.hstack([cam_pos, tz]), are in the some but what space?

Thus, could you elaborate a little bit on the parameters of this weak perspective projection?

Thanks in advance.

jszgz commented

Hello, do you know how to use mpi_inf_3dhp_to_tfrecords.py to convert mpi_inf_3dhp dataset? I failed because the code use jpg as input but the dataset I downloaded is consisting of videos. Do I need to use ffmpeg and write code to convert avi to jpg?

nnop commented

In case some is coming to this issue.
For the 1st question. The keypoints is normalized to [-1, 1] in data preprocessing.

hmr/src/data_loader.py

Lines 320 to 325 in f149abe

# Normalize kp output to [-1, 1]
final_vis = tf.cast(crop_kp[2, :] > 0, tf.float32)
final_label = tf.stack([
2.0 * (crop_kp[0, :] / self.output_size) - 1.0,
2.0 * (crop_kp[1, :] / self.output_size) - 1.0, final_vis
])

So the predicted s should be rescaled by 0.5 * img_size for the original image.
That makes tz = f / (0.5 * img_size * cam_s). This is a suttle detail.

For the 2nd question, it's in the camera frame which is not consistent with the paper's equation.

@nnop
Hi, the keypoints is normalized to [-1,1], When using weak perspective projection, won't this cause projection errors from 3D points to 2D points, because 2D information loses the ratio between length and width, while 3D points still retain the aspect ratio information