facebookresearch/InterHand2.6M

The scale of the depth

ZYX-MLer opened this issue · 1 comments

I've been studying the data processing part of algorithms recently, and I've noticed that in the getitem function, when the input image bounding box size is not consistent with cfg.input_img_shape, the image is resized along the x and y dimensions by the code at the bottom of the augmentation function .

for i in range(joint_num):
        joint_coord[i,:2] = trans_point2d(joint_coord[i,:2], trans)
        joint_valid[i] = joint_valid[i] * (joint_coord[i,0] >= 0) * (joint_coord[i,0] < cfg.input_img_shape[1]) * (joint_coord[i,1] >= 0) * (joint_coord[i,1] < cfg.input_img_shape[0])

However, no corresponding processing is applied to the depth information. Therefore, during the actual training process, the x and y values predicted by the depth model are relative to the trainning image, while the depth information is in the camera's coordinate space. Is this understanding correct? Should the depth information also be scaled accordingly?

You can see this function

def transform_input_to_output_space(joint_coord, joint_valid, rel_root_depth, root_valid, root_joint_idx, joint_type):