/geometry-preserving-de

Towards General Purpose, Geometry Preserving Single-View Depth Estimation https://arxiv.org/abs/2009.12419

Primary LanguagePythonMIT LicenseMIT

Towards General Purpose, Geometry Preserving Single-View Depth Estimation [Paper] [Weights] [Video]


This repository provides official code (Python+Pytorch) to run models from the "Towards General Purpose, Geometry Preserving Single-View Depth Estimation" paper:

Towards General Purpose Geometry-Preserving Single-View Depth Estimation
Mikhail Romanov, Nikolay Patatkin, Anna Vorontsova, Sergey Nikolenko, Anton Konushin, Dmitry Senyushkin
Samsung Research
https://arxiv.org/abs/2009.12419

teaser

Also you may consider watching video with the additional results (animated point cloud renders) and comparisons with existing single-view depth estimation models.

Set up an environment

To set up an anaconda environment, use the following commands:

conda create -n efficient-de python=3.7
conda activate efficient-de
conda install pytorch==1.5.1 torchvision cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Checkpoints

We release weights of our final models trained on a mixture of 4 datasets (RedWeb + DIML Indoor + MegaDepth + Stereo Movies). Download desired checkpoints from the links below to the weights/ folder:

Encoder Decoder Link NYU TUM Sintel DIW ETH3D Params TFLOPS*
MobileNetV2 LRN mn-lrn4.pth 14.64 15.13 0.360 15.02 0.191 2.4 1.17
EfficientNet-Lite0 LRN lite0-lrn4.pth 14.15 14.41 0.354 14.59 0.177 3.6 1.29
EfficientNet-B0 LRN b0-lrn4.pth 13.84 15.95 0.330 13.15 0.168 4.2 1.66
EfficientNet-B1 LRN b1-lrn4.pth 12.80 15.03 0.315 12.71 0.179 6.7 2.22
EfficientNet-B2 LRN b2-lrn4.pth 13.04 15.36 0.304 13.06 0.168 8 2.5
EfficientNet-B3 LRN b3-lrn4.pth 12.35 14.38 0.343 12.95 0.176 11 3.61
EfficientNet-B4 LRN b4-lrn4.pth 11.92 13.55 0.346 12.81 0.164 18 5.44
EfficientNet-B5 LRN b5-lrn4.pth 10.64 13.05 0.328 12.56 0.154 29 8.07
  • TFLOPS are estimated for a single 384x384 image input

For NYU, TUM datasets delta-1 metrics are given, for Sintel, ETH3D - relative errors, DIW - WHDR (Human Disagreement Rate)

Usage

Model list: mn_lrn4, lite0_lrn4, b0_lrn4, b1_lrn4, b2_lrn4, b3_lrn4, b4_lrn4, b5_lrn4.

To run inference on an image folder use the following command:

python inference.py --src-dir <your-image-folder-path> --out-dir output/ --vis-dir vis/ --model b5_lrn4

By default, this would create individual PyTorch .pth files with model predictions in log-disparity domain for each image. For convenience, you can also specify --domain flag and change to disparity (like MiDaS), depth (like most models trained on sensor-based conventional datasets) and log-depth (MegaDepth, Mannequin etc.) domain.

Evaluation

  1. Download and prepare evaluation datasets:

  2. Unpack data and place to common folder so that it forms the following structure:

    data-folder/
         nyu_depth_v2_labeled.mat
         TUM/
            <file-id>.h5
         sintel/
            final/
            depth/
         DIW/
            DIW_test/
            DIW_test.csv
         ETH3D/
    

    Specify path to your data-folder in config.py

  3. Run script: python eval.py --model b5_lrn4.By default, the script will evaluate the model on all datasets (NYU, TUM, ETH3D, Sintel, DIW). You can specify datasets you need explicitly: python eval.py --ds nyu tum eth3d --model b5_lrn4.

Citation

If you find this work is useful for your research, please cite our paper:

@article{geometricde2021,
      title={Towards General Purpose Geometry-Preserving Single-View Depth Estimation}, 
      author={Mikhail Romanov and Nikolay Patatkin and Anna Vorontsova and Sergey Nikolenko and Anton Konushin and Dmitry Senyushkin},
      year={2021},
      eprint={2009.12419}
}