Towards General Purpose, Geometry Preserving Single-View Depth Estimation [Paper] [Weights] [Video]

This repository provides official code (Python+Pytorch) to run models from the "Towards General Purpose, Geometry Preserving Single-View Depth Estimation" paper:

Towards General Purpose Geometry-Preserving Single-View Depth Estimation
Mikhail Romanov, Nikolay Patatkin, Anna Vorontsova, Sergey Nikolenko, Anton Konushin, Dmitry Senyushkin
Samsung Research
https://arxiv.org/abs/2009.12419

Also you may consider watching video with the additional results (animated point cloud renders) and comparisons with existing single-view depth estimation models.

Set up an environment

To set up an anaconda environment, use the following commands:

conda create -n efficient-de python=3.7
conda activate efficient-de
conda install pytorch==1.5.1 torchvision cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Checkpoints

We release weights of our final models trained on a mixture of 4 datasets (RedWeb + DIML Indoor + MegaDepth + Stereo Movies). Download desired checkpoints from the links below to the weights/ folder:

Encoder	Decoder	Link	NYU	TUM	Sintel	DIW	ETH3D	Params	TFLOPS*
MobileNetV2	LRN	mn-lrn4.pth	14.64	15.13	0.360	15.02	0.191	2.4	1.17
EfficientNet-Lite0	LRN	lite0-lrn4.pth	14.15	14.41	0.354	14.59	0.177	3.6	1.29
EfficientNet-B0	LRN	b0-lrn4.pth	13.84	15.95	0.330	13.15	0.168	4.2	1.66
EfficientNet-B1	LRN	b1-lrn4.pth	12.80	15.03	0.315	12.71	0.179	6.7	2.22
EfficientNet-B2	LRN	b2-lrn4.pth	13.04	15.36	0.304	13.06	0.168	8	2.5
EfficientNet-B3	LRN	b3-lrn4.pth	12.35	14.38	0.343	12.95	0.176	11	3.61
EfficientNet-B4	LRN	b4-lrn4.pth	11.92	13.55	0.346	12.81	0.164	18	5.44
EfficientNet-B5	LRN	b5-lrn4.pth	10.64	13.05	0.328	12.56	0.154	29	8.07

TFLOPS are estimated for a single 384x384 image input

For NYU, TUM datasets delta-1 metrics are given, for Sintel, ETH3D - relative errors, DIW - WHDR (Human Disagreement Rate)

Usage

Model list: mn_lrn4, lite0_lrn4, b0_lrn4, b1_lrn4, b2_lrn4, b3_lrn4, b4_lrn4, b5_lrn4.

To run inference on an image folder use the following command:

python inference.py --src-dir <your-image-folder-path> --out-dir output/ --vis-dir vis/ --model b5_lrn4

By default, this would create individual PyTorch .pth files with model predictions in log-disparity domain for each image. For convenience, you can also specify --domain flag and change to disparity (like MiDaS), depth (like most models trained on sensor-based conventional datasets) and log-depth (MegaDepth, Mannequin etc.) domain.

Evaluation

Download and prepare evaluation datasets:
- NYUv2. For evaluation official test split is used. You can download data from http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat (654 images)
- TUM. Here we use the evaluation subset proposed by Li et al. in "Learning the Depths of Moving People by Watching Frozen People" capturing people in an indoor environment. Download https://storage.googleapis.com/mannequinchallenge-data/tumtest/tum_hdf5.tgz and unpack the file. (1815 images)
- DIW. Test subset images and annotations can be downloaded from http://www-personal.umich.edu/~wfchen/depth-in-the-wild/ page.
- Sintel. Download training set images from http://sintel.is.tue.mpg.de/downloads and depth from http://sintel.is.tue.mpg.de/depth (1064 images total, 19 of them - with depth strictly larger than 72 meters). Leave two folders - "final" and "depth".
- ETH3D. The original depth provided by the dataset is sparse. We rendered depth maps from clean point clouds for evaluation purposes, given intrinsics and extrinsic at 4x less resolution than the original images. Our preprocessed version of dataset can be downloaded from link

Unpack data and place to common folder so that it forms the following structure:

data-folder/
     nyu_depth_v2_labeled.mat
     TUM/
        <file-id>.h5
     sintel/
        final/
        depth/
     DIW/
        DIW_test/
        DIW_test.csv
     ETH3D/

Specify path to your data-folder in config.py

Run script: python eval.py --model b5_lrn4.By default, the script will evaluate the model on all datasets (NYU, TUM, ETH3D, Sintel, DIW). You can specify datasets you need explicitly: python eval.py --ds nyu tum eth3d --model b5_lrn4.

Citation

If you find this work is useful for your research, please cite our paper:

@article{geometricde2021,
      title={Towards General Purpose Geometry-Preserving Single-View Depth Estimation}, 
      author={Mikhail Romanov and Nikolay Patatkin and Anna Vorontsova and Sergey Nikolenko and Anton Konushin and Dmitry Senyushkin},
      year={2021},
      eprint={2009.12419}
}