princeton-vl/DeepV2D

Results even better using "default" validation method?

cgebbe opened this issue · 2 comments

I believe your results might be even slightly better if you use the default validation method:

The paper states that you directly use the 192x1088 output image of the CNN for evaluation. In contrast, other papers first resize the inferred image to the RGB size, crop it and then evaluate it, see https://github.com/nianticlabs/monodepth2/blob/master/evaluate_depth.py#L187

You can do the same if you first pad the output image with 108 pixels to undo the previous cropping and then perform the resizing and cropping. In that case I get an absRelErr=0.0640. I believe the improvement is due to the fact that I see some artifacts at the top which are simply cropped away with this method.

Note however, that I have skipped some of the 697 images from the Eigen split, if one of the four neighboring images was not available. How have you dealt with these cases? It is not mentioned at all in the paper.

Hi, thanks for the information. We trained on 192x1088 images, but we evaluate on the full image like other methods. Our evaluation code is directly from https://github.com/tinghuiz/SfMLearner (with scale matching removed), which resizes our predicted depth maps to match the full image, then applies the standard crop. I've also tested the monodepth evaluation code (https://github.com/mrharicot/monodepth) and got the exact same results.

I think the better performance might be because you are only testing the model on the images with 4 neighboring frames. When not all neighboring images are available (for example, when the test image appears as the first frame in the video), we just copy to first frame. So the five frame video would have frames indices (0 0 0 1 2). This way our results are directly comparable to prior work.

@cgebbe @zachteed Hi, I found the absRelErr in the paper is 0.037. Why do you say that 0.0640 is better? Many thanks!