baegwangbin/MaGNet

How do you predict the scale of the depth

wzc2021 opened this issue · 1 comments

the current model input only has relative pose. I don't see any input of model related to depth scale (like depth range or absolute value of extrinsics). If a dataset has mixied scales of depth of images (drone data and regular photo), your training/inference can fail.

Hi,

As mentioned in the paper, MaGNet can suffer when the physical scale of the scene changes significantly from that of the training scenes (e.g. it will suffer if you train on ScanNet and test on KITTI).

A possible alternative can be to (1) train D-Net to estimate relative depth (e.g. by using scale-invariant training loss), and (2) for each input sequence, find the optimal scaling factor that minimizes the reprojection error.

We assumed a scenario where the network is trained and tested on indoor scenes. In such scenario, single-view predictions can act as a useful prior on the scene geometry.