Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume - ML Reproducibility Challenge 2020

This project is a reproduction of the CVPR 2020 paper

Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume

Adrian Johnston, Gustavo Carneiro

CVPR 2020 (arXiv pdf)

It proposes to close the performance gap with the fully-supervised methods using only the monocular sequence for training with the help of additional layers - self-attention and discrete disparity volume.

Setup procedure

Clone project from GitHub.
Change to the directory Self-supervised-Monocular-Trained-Depth-Estimation-using-Self-attention-and-Discrete-Disparity-Volum.
Install packages
In order to reproduce the code install the packages by running the below command.
```
    pip install -r requirements.txt
```
This project uses Python 3.6.6, cuda 10.1, pytorch 0.4.1, torchvision 0.2.1, tensorboardX 1.4 and opencv. The experiments were conducted using NVIDIA Tesla P100 GPU and CPU environment - Intel Xeon E5-2660 v4 (2.0GHz, 35M Cache).
Download the required data sets.
The data set that is used in this project are KITTI Raw and leftImg8bit of Cityscapes.

Training

The paper claims to achieve state-of-the-art results using only monocular sequence, unlike previous algorithms which relied on both stereo and monocular images.

python3 train.py --data_path <path to kitti dataset> --log_dir tmp/ --model_name <model> --png

For setting/ altering other input parameters for abalation study or hyperparameter search refer the options.py

Evaluation

Prepare ground truth data by

python export_gt_depth.py --data_path kitti_data --split eigen --dataset <kitti or cityscapes>

The accuracy and loss values of a trained model can be infered using the below command

python evaluate_depth.py --data_path <path to kitti dataset> --load_weights_folder <trained model weights> --eval_mono --png

Inference

The inference prints the depth map, space occupied by the model and inference time as output for a given image(s) file/folder.

python test_simple.py --image_path <image(s) path> --model_name <model weights>

Results

Below are the results obtained on the KITTI Raw test set for the models trained in the project.

NOTE
The results obtained are system specific. Due to different combinations of the neural network cudnn library versions and NVIDIA driver library versions, the results can be slightly different. To the best of my knowledge, upon reproducing the environment, the ballpark number will be close to the results obtained.

abs_rel	sq_rel	RMSE	RMSE log	a1	a2	a3
0.108	93.13	4.682	0.185	0.889	0.962	0.982

Training time	Inference time (CPU)	Inference time (GPU)	Memory
204 hours	6108.5 +/- 12.23	653.21 +/- 0.98	252.7 MB

References

Monodepth2 - https://github.com/nianticlabs/monodepth2
OCNet - https://github.com/openseg-group/OCNet.pytorch
DORN - https://arxiv.org/abs/1806.02446

sjsu-smart-lab/Self-supervised-Monocular-Trained-Depth-Estimation-using-Self-attention-and-Discrete-Disparity-Volum