Huangying-Zhan/Depth-VO-Feat

Questions about "Differentiable geometry modules"

lixiangyu-1008 opened this issue · 8 comments

hi @Huangying-Zhan
Thanks your works. I have some questions about the "Differentiable geometry modules“ in your paper and code.

  1. The coordinate projection process formulated as P(L,t1) = K*T(t2->t1)*D(L,t2)K^-1p(L,t2), It means the estimated pose is from the t2 to t1 as the input images stack order is I1, I2 ? As your code for evaluating the whole sequence 09 and 10. The input queue should be inverse i.e. (t2,t1),(t3,t2),(t4,t3) ?

  2. So why the camera pose in projection function cannot be T(t1->t2), from source view to target view.

HI @lixiangyu-1008 ,

(1) The order of image stack (I1+I2 or I2+I1) is not related to the direction of the relative pose. We WANT the estimated pose to be from (t2->t1) and it is constrained by the layers after pose estimation. To summarize, the input is (I1+I2 or I2+I1) doesn't matter but we choose I1+I2. The estimated pose is from (t2->t1). The reason is explained below.

(2) In the warping process, we want to synthesize I_t2 from I_t1.
KITTI data is moving forward most of the time and most pixels in I_t2 can be found from I_t1. If we set T(t1->t2), then most pixels at the border region will not be seen in I_t2. It means that the photometric loss in these regions is useless. Therefore, we choose T(t2->t1) so that we can have more meaningful photometric loss in most regions.

Hi @Huangying-Zhan
Thanks for your replying!
(1) the function "getPredPoses" in evaluation_tools.py :
img1_path = seq_path + "/image_02/data/{:010}.png".format(idx)
img2_path = seq_path + "/image_02/data/{:010}.png".format(idx+1)
img1 = self.getImage(img1_path)
img2 = self.getImage(img2_path)
self.odom_net.blobs['imgs'].data[0,:3] = img2
self.odom_net.blobs['imgs'].data[0,3:] = img1

so why the input is img2+img1 instead of img1+img2 ?

(2) The warping process function defined as
P(s) = K*T(t->s)*D(t)K^-1p(t), where s: source view ; t: target view ; T is from target view to source view

so can it be represented as P(s) = K*T(s->t)*D(t)K^-1p(t), where T is from source view to target view ??

@lixiangyu-1008
(1) Sorry, my mistake. We chose img2+img1. As I mentioned, the order of concatenation shouldn't matter that much.
In the evaluation code, I just follow the fashion of training code, which can be found here.
https://github.com/Huangying-Zhan/Depth-VO-Feat/blob/master/experiments/depth_odometry/train.prototxt#L93

(2) No. D(t)K^-1p(t) this part gives the 3D coordinates of the points in the target view. After applying the T(t->s), the 3D points are transformed into the coordinates in source view coordinate system.

@Huangying-Zhan
Thanks, I understand the warping process.
However, In the evaluation code, as you describe, you will get the estimated relative pose (1->0), (2->1), (3->2),... then use these poses to be evaluated in KITTI odometry benchmark?
It seems that these are backward poses?

Yes. you are right. I have mentioned the reason before. It is about getting better warping loss.

In the evaluation of the trajectory, I convert the relative poses to absolute camera pose w.r.t to frame 0.
Please check the following line.
https://github.com/Huangying-Zhan/Depth-VO-Feat/blob/master/tools/evaluation_tools.py#L168

Thank you, So when we convert the relative poses (1->0), (2->1), (3->2) to absolute camera pose w.r.t to frame 0.
We should first convert the relative poses (1->0), (2->1), (3->2) to (0->1), (1->2), (2->3)? or just use the (1->0), (2->1), (3->2)?

Both are ok, depending on how you compute the pose.
For example, if I want to get T(3->0)=T_03, T_03=T_01 * T_12 * T_23, in which I use (1->0), (2->1), (3->2).

If you want to use (0->1), (1->2), (2->3), you can calculate T_03 = inv(T_32 * T_21 * T_10)
Just a matter of inverse and multiplication order.

Thank you very much for your patient reply! I already understand what you mean!