Predicted parameters of the weak perspective projection

Hi, @akanazawa sorry to bother you.
I am confused w.r.t the predicted parameters of the weak perspective projection.

As you mentioned that scale s that HMR recovers is essentially focal_length/z, but the following line

hmr/src/util/renderer.py

Line 247 in bce0ef9

tz = flength / (0.5 * img_size * cam_s)

suggests that 0.5 * img_size comes into play, why?
This line code

hmr/src/util/renderer.py

Line 249 in bce0ef9

vert_shifted = verts + trans

suggests that verts and trans, which is trans = np.hstack([cam_pos, tz]), are in the some but what space?

Thus, could you elaborate a little bit on the parameters of this weak perspective projection?

Thanks in advance.

Hello, do you know how to use mpi_inf_3dhp_to_tfrecords.py to convert mpi_inf_3dhp dataset? I failed because the code use jpg as input but the dataset I downloaded is consisting of videos. Do I need to use ffmpeg and write code to convert avi to jpg?

In case some is coming to this issue.
For the 1st question. The keypoints is normalized to [-1, 1] in data preprocessing.

hmr/src/data_loader.py

Lines 320 to 325 in f149abe

    
           # Normalize kp output to [-1, 1] 
        
           final_vis = tf.cast(crop_kp[2, :] > 0, tf.float32) 
        
           final_label = tf.stack([ 
        
               2.0 * (crop_kp[0, :] / self.output_size) - 1.0, 
        
               2.0 * (crop_kp[1, :] / self.output_size) - 1.0, final_vis 
        
           ])

So the predicted s should be rescaled by 0.5 * img_size for the original image.
That makes tz = f / (0.5 * img_size * cam_s). This is a suttle detail.

For the 2nd question, it's in the camera frame which is not consistent with the paper's equation.

@nnop
Hi, the keypoints is normalized to [-1,1], When using weak perspective projection, won't this cause projection errors from 3D points to 2D points, because 2D information loses the ratio between length and width, while 3D points still retain the aspect ratio information

	# Normalize kp output to [-1, 1]
	final_vis = tf.cast(crop_kp[2, :] > 0, tf.float32)
	final_label = tf.stack([
	2.0 * (crop_kp[0, :] / self.output_size) - 1.0,
	2.0 * (crop_kp[1, :] / self.output_size) - 1.0, final_vis
	])