jahaniam/semiDepth

KITTI quantitative comparison and visualized qualitative comparison

joseph-zhang opened this issue · 3 comments

Hello,
Very thank you for sharing your excellent work.

I entirely agree that we should use official KITTI depth data in the experiments. As what have been discussed in "evaluation Eigen split #13" and "Inacurate Eigen Split Evaluation #166", whether the data we use is official depth data or raw KITTI data has a great effect on the final evaluation result.


I noticed that you compared your test result with DORN on 652 official depth data (the overlap part with Eigen Split), the RMSE result of DORN shown in your paper is 2.888 (2.727 given in original DORN paper, a very close value). However, the number of test images mentioned in DORN paper is 697, which is the same as Eigen Split. (They didn't give detailed information of the composition of their test data. Do you have any idea about this?)

Actually, I have used their model given in DORN project to do evaluation on totally 697 Eigen Split test data (where depth gt are generated from Raw KITTI data by using monodepth's script), with flag option 'vel_depth'=True. I got cap-80m RMSE result 4.2643 (when 'vel_depth'=False, The RMSE is 4.3024). Do you think this means that the DORN model is trained on official KITTI depth data? What DORN model did you use in your quantitative comparison? (The model provided by hufu6371/DORN?)


I have also noticed that you constructed some 'filenames' files to support the official KITTI data loading. The file 'eigen_train_files_withGT_annotated.txt' contains filenames of official depth ground truth for training phase. This file shows that you use some depth data located in '/KITTI/KITTI_Depth/extras/...', which are not provided by KITTI official depth data. How did you generate these "extra" depth training data? (Did you use raw KITTI data here? I noticed that the paths contain '.../velodyne_raw/...')


At the end of your paper, figure. 4 shows the visually pleasing result of your method. The qualitative comparison is very interesting. Could you please give some more detailed information about the generation of dense color ground truth during visualization process? How did you interpolate to produce the dense GT for visualization? What kind of data you used in the interpolation process? (Raw KITTI velodyne data?)


Once again thanks for sharing your code. The work is very meaningful, it promotes the establishment of a fair KITTI evaluation in the depth estimation field.

Hi Joseph

Thank you for your comment.


I remember at first I used lidar projected using monodepth eval, I got lower rmse for DORN than 2.888. So I thought they used lidar. I will not be surprised if they used official kitti in their training hence they won the robust vision challenge and official kitti was build for that challenge(or reverse) so they were aware that official depth kitti exists. Yes, I used their provided model and inference code (caffe). I am sure I didn't get 4.3 I suggest you double-check your code to see if everything is alright!


About extras, you are right, I wanted to do a fair comparison(in both training and evaluation) so I generated the files that weren't in the official kitti based on the lidar. Eventually, I think I didn't use them and you can just remove the images with 'extras'. It shouldn't affect the result much. I will update those files soon but for now, if you want to train the network just remove the whole line if contains 'extras'


For visualization, I used monodepth code for interpolation by setting interp=True.
interpolation
But this is not a good interpolation, especially on the top line. There are a couple of better techniques that help densify the Lidar.


I agree. For kitti we reach to a point that we cannot rely on LiDAR evaluation. LiDAR itself has the high error(see Table I). So it makes sense if the evaluation between Lidar and official Kitti has the difference of 1-2 RMSE.

image

If you are interested to reproduce the result I got for DORN, you can download my DORN result for 697 eigen files from this link.

Then use this code. It first filters the result to 652 files that GT exist for them and then evaluate based on 652 files of official semi dense ground truth :

python2 utils/evaluate_kitti_depth.py --split eigen --predicted_disp_path ../DORN/eigen697_depth.npy --gt_path /home/datasets/ --garg_crop --depth_provided --test_file utils/filenames/eigen_test_files_withGT.txt --shared_index utils/filenames/eigen692_652_shared_index.txt

This should give you


abs_rel,     sq_rel,        rms,    log_rms,     d1_all,         a1,         a2,         a3
    0.0807,     0.3326,      2.888,      0.120,      0.000,      0.938,      0.986,      0.995

Hope it helps. You can further see why we are getting different results. You can also evaluate it based on raw LiDAR. I am curious if it is due to the code or the inference output.

Thank you very much for providing these details, they are very helpful!