JingwenWang95/KinectFusion

Code run fine, but need clarification regarding pose matrix

Homagn opened this issue · 1 comments

  1. Can we assume that this algorithm only needs an initial pose matrix, or does it need poses for each step in the entire camera trajectory?
  2. If so, how can we specify initial pose for a custom dataset?

I checked the world_mats variable in tum_rgbd.py and I tweeked it to use only store the first pose value in a repeated fashion (eg below)

pose1 = np.array([[ 3.18131473e+02, 4.60102595e+02, -2.37143996e+02, 6.60288595e+00],
[ 1.32248250e+02, -1.87081717e+02, -5.28617550e+02, 5.54961371e+02],
[-3.51650973e-01, 5.42460640e-01, -7.62940395e-01, 1.31740951e+00]])

pose2 = np.array([[ 3.44746770e+02, 4.56713777e+02, -2.04209444e+02, -5.30226155e+02],
[ 9.00883139e+01, -1.36841787e+02, -5.52344190e+02, 8.36896216e+02],
[-3.39983783e-01, 6.50759651e-01, -6.78913032e-01, 9.25806734e-01]])

ini_pose = pose2
tentative_poses = [ini_pose for _ in range(572)]
world_mats = np.stack(tentative_poses, axis=0)

I see that if I set ini_pose=pose1 (a random value), reconstruction messes up, but if I set=pose2 (cameras.npz, first pose matrix) then the code works as usual, so is there something special about the initial pose ?

Hi @Homagn 1. Yes, it only requires the initial pose. 2. Here the most important thing is to define your own world coordinate frame and set your initial camera_to_world transformation accordingly. You can pre-define a world frame (e.g. define it on the floor and point z-direction upward) or simply set the first camera frame as world just like in many other SLAM system. But either ways you need to be consistent. Then you will also need to define the volume bound as in the config file to cover your region of interest. The volume bound should also be in the same world coordinate frame.

Now you should understand why it messed up when you used a random initial pose: it simply messed up with inconsistent coordinate frames. So there is nothing special about the initial pose, it is just the camera_to_world tranformation of the first camera frame. But you have to clearly define the world coordinate frame, volume bound and always be consistent.