What does this mean to scale the rotation matrix variables by T_cam_obj[:3, :3] *= l?

As it stated in the kitti_sequence.py, line 146:

Line 146 in ed14d02

T_cam_obj[:3, :3] *= l

Moreover, it seems that that detected 3d box has yaw (aka z axis) by instead the T_velo_obj is constructed with rotation angle in y axis?

DSP-SLAM/reconstruct/kitti_sequence.py

Line 118 in ed14d02

T_velo_obj = np.array([[np.cos(theta), 0, -np.sin(theta), trans[0]],

It's a little confusing..

Hi @qinyq! Sorry for my late response, we were working for a deadline these days. The first question: The object pose T_cam_obj has 7-DoF, because in ShapeNet coordinate everything is normalized in a unit-sphere. Here we use the length of the detection box as initial scale for optimisation at later stage.

The second question: Actually it is rotated along z-axis. Here we have two coordinate systems: velo (forward, left, upward) and object (right, upward, backward). The velo coordinate follows the definition in KITTI dataset and the object coordinates follows ShapeNet definition. Here we want the object to velo transformation matrix T_velo_obj, for the rotation part you just need to express the three basis vectors of object coordinate under the velo coordinate. We only consider yaw angle here so the object-y and velo-z are always aligned, and that's why the second column is always [0, 0, 1]. As for the first and third column you just need to write down the three coordinates of object-x and object-z under the velo coordinate. Note that different detectors might have different conventions in terms of how the yaw angle is defined, so you need to be careful.

Hope this answers your question

Hi @qinyq! Sorry for my late response, we were working for a deadline these days. The first question: The object pose T_cam_obj has 7-DoF, because in ShapeNet coordinate everything is normalized in a unit-sphere. Here we use the length of the detection box as initial scale for optimisation at later stage.

The second question: Actually it is rotated along z-axis. Here we have two coordinate systems: velo (forward, left, upward) and object (right, upward, backward). The velo coordinate follows the definition in KITTI dataset and the object coordinates follows ShapeNet definition. Here we want the object to velo transformation matrix T_velo_obj, for the rotation part you just need to express the three basis vectors of object coordinate under the velo coordinate. We only consider yaw angle here so the object-y and velo-z are always aligned, and that's why the second column is always [0, 0, 1]. As for the first and third column you just need to write down the three coordinates of object-x and object-z under the velo coordinate. Note that different detectors might have different conventions in terms of how the yaw angle is defined, so you need to be careful.

Hope this answers your question

Thanks! Questions are addressed.