JingwenWang95/DSP-SLAM

What does this mean to scale the rotation matrix variables by T_cam_obj[:3, :3] *= l?

qinyq opened this issue · 2 comments

qinyq commented

As it stated in the kitti_sequence.py, line 146:

T_cam_obj[:3, :3] *= l

Moreover, it seems that that detected 3d box has yaw (aka z axis) by instead the T_velo_obj is constructed with rotation angle in y axis?

T_velo_obj = np.array([[np.cos(theta), 0, -np.sin(theta), trans[0]],

It's a little confusing..

Hi @qinyq! Sorry for my late response, we were working for a deadline these days. The first question: The object pose T_cam_obj has 7-DoF, because in ShapeNet coordinate everything is normalized in a unit-sphere. Here we use the length of the detection box as initial scale for optimisation at later stage.

The second question: Actually it is rotated along z-axis. Here we have two coordinate systems: velo (forward, left, upward) and object (right, upward, backward). The velo coordinate follows the definition in KITTI dataset and the object coordinates follows ShapeNet definition. Here we want the object to velo transformation matrix T_velo_obj, for the rotation part you just need to express the three basis vectors of object coordinate under the velo coordinate. We only consider yaw angle here so the object-y and velo-z are always aligned, and that's why the second column is always [0, 0, 1]. As for the first and third column you just need to write down the three coordinates of object-x and object-z under the velo coordinate. Note that different detectors might have different conventions in terms of how the yaw angle is defined, so you need to be careful.

Hope this answers your question

qinyq commented

Hi @qinyq! Sorry for my late response, we were working for a deadline these days. The first question: The object pose T_cam_obj has 7-DoF, because in ShapeNet coordinate everything is normalized in a unit-sphere. Here we use the length of the detection box as initial scale for optimisation at later stage.

The second question: Actually it is rotated along z-axis. Here we have two coordinate systems: velo (forward, left, upward) and object (right, upward, backward). The velo coordinate follows the definition in KITTI dataset and the object coordinates follows ShapeNet definition. Here we want the object to velo transformation matrix T_velo_obj, for the rotation part you just need to express the three basis vectors of object coordinate under the velo coordinate. We only consider yaw angle here so the object-y and velo-z are always aligned, and that's why the second column is always [0, 0, 1]. As for the first and third column you just need to write down the three coordinates of object-x and object-z under the velo coordinate. Note that different detectors might have different conventions in terms of how the yaw angle is defined, so you need to be careful.

Hope this answers your question

Thanks! Questions are addressed.