Wrong depth scale when using ground-truth camera poses.
ootts opened this issue · 3 comments
Hi, I have a question about using ground-truth camera poses instead of predicted camera poses. I tried to use camera poses with the correct scale in the KITTI dataset, but I find the scale not correct yet. Is there anything I missed? I only changed the code as follows.
output, lowest_cost, costvol = encoder(input_color, lookup_frames,
relative_poses, # change to relative_poses_gt
K,
invK,
min_depth_bin, max_depth_bin)
Thanks a lot!
Hi - thanks for your interest in the project!
Right yes, so the problem with this is that the depth network will be in the same scale as the pose network - some unknown, arbitrary scale.
I'm trying to think of a way to use gt pose to scale the depth estimates, but it isn't immediately obvious.
One way you could do it, would be to ask the depth + pose networks to make predictions as normal, and afterwards scale your depths by the ratio of the predicted translation and the ground truth translation. I can't guarantee that this will give a good result however, but I'd be interested to hear how you get on.
@mdfirman any thoughts?
I tried to use gt_pose and abandon the posenet in monodepth, and the output scale is almost correct(about 0.9*gt_depth), so I assume this will work too for manydepth?
What I wonder is how can I finetune with a pretrained model, whose scale is arbitrary, to get the real-world-scale result. In monodepth I scale the groundtruth to the pretrained scale, and scale back when predict. I wonder if there's a better way to do this.
@biggiantpigeon Hi. Have you tried using gt_pose in manydepth? I wonder whether it is feasible? Thank you!