kazuotani14/EventBasedVisualOdometry

Is it right for the computation that converts the points from camera coordinate to world coordinate in GetNewMapPoints.m?

Opened this issue · 8 comments

Hello,
I am interested about EVO, thank you for your amazing work.
But I just wonder the 38th line in GetNewMapPoints.m:
new_map_points = [points_in_camera_frame, ones(size(points_in_camera_frame,1),1)]*tform';

Because tform the the transform matrix converting the world coordinate to camera coordinate,
if we want to get the points in world coordinate, we should use the inverse of tform instead of original one?
That is:
new_map_points = [points_in_camera_frame, ones(size(points_in_camera_frame,1),1)]*inv(tform)';
Is it?

I guess the comment on the transform matrix is misleading: tform is actually the camera pose in the world frame, T_c^w. The points are in camera frame, p_c. So we premultiply by the transpose to cancel out the c and get p_w.

aha, so you mean that the tform matrix is the converting matrix from camera coordinate to world coordinate actually, not from world to camera?

In that context, yes.

Thank you for your wonderful quick response. Another confusion I have is that the way to compute homography matrix. As I know the equation should be H = K * (R - t * n' / d) * inv(K), but in the code is H = K * (R - R * t * n' / d) * inv(K). There's an extra R matrix. I check the paper "Real-time plane-sweeping stereo with multiple sweeping directions." it is same with code. Is there any difference between standard homography matrix and the plane-sweep scenario?

Sorry but I'm not sure, we tried a few different methods for calculating H, and this is the one that worked.

My understanding is that in traditional homography scenario, we project from one camera image plane to another camera image plane, in which case we considered the image plane being projected has no rotation and no translation, because we don't really care about the world frame. In this case the homography matrix can be simplified to your equation H = K * (R - t * n' / d) * inv(K).

In plane-sweep scenario, KF depth-planes are moving targets in the world frame, such planes have rotations and translations, in which case we have to use the more general formula H = K * (R - R * t * n' / d) * inv(K).

This wikipedia page has some more details and the derivations.

Yes, it is just the reason. I don't know how the formula in wikipedia is derived. But I used another formula which is different with our code but also considering the pose of key frame camera. Even the exact form of formula changed, but the result of H is the same. I also have another question. In GetNewMapPoints, can we directly use zinv(K)(x,y,1)' (where K is the corrected matrix which is not in matlab fashion, and z is the depth) to get the 3D points in camera frame? I tried this method, the code likes here:
z_values = KF_depths(depth_map(valid_idx))';
points_in_camera_frame = K \ [valid_x.* z_values, valid_y .* z_values, z_values]';
but the result is different with the result produced by the original code.

I am sorry that the formula should be: z * inv(K) * (x,y,1)'