MiDaS: A Java repository from Glass Imaging

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

This repository contains code to compute depth from a single image. It accompanies our paper:

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun

and our preprint:

Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

MiDaS was trained on 10 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS) with multi-objective optimization. The original model that was trained on 5 datasets (MIX 5 in the paper) can be found here.

Changelog

[Apr 2021] Released MiDaS v3.0:
- New models based on Dense Prediction Transformers are on average 21% more accurate than MiDaS v2.1
- Additional models can be found here
[Nov 2020] Released MiDaS v2.1:
- New model that was trained on 10 datasets and is on average about 10% more accurate than MiDaS v2.0
- New light-weight model that achieves real-time performance on mobile platforms.
- Sample applications for iOS and Android
- ROS package for easy deployment on robots
[Jul 2020] Added TensorFlow and ONNX code. Added online demo.
[Dec 2019] Released new version of MiDaS - the new model is significantly more accurate and robust
[Jul 2019] Initial release of MiDaS (Link)

Setup

Pick one or more models and download corresponding weights to the weights folder:

For highest quality: dpt_large
For moderately less quality, but better speed on CPU and slower GPUs: dpt_hybrid
For real-time applications on resource-constrained devices: midas_v21_small
Legacy convolutional model: midas_v21

Set up dependencies:
```
conda install pytorch torchvision opencv
pip install timm
```
The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5.

Usage

Place one or more input images in the folder input.

Run the model:

python run.py --model_type dpt_large
python run.py --model_type dpt_hybrid 
python run.py --model_type midas_v21_small
python run.py --model_type midas_v21

The resulting inverse depth maps are written to the output folder.

via Docker

Make sure you have installed Docker and the NVIDIA Docker runtime.
Build the Docker image:
```
docker build -t midas .
```
Run inference:
```
docker run --rm --gpus all -v $PWD/input:/opt/MiDaS/input -v $PWD/output:/opt/MiDaS/output midas
```
This command passes through all of your NVIDIA GPUs to the container, mounts the input and output directories and then runs the inference.

via PyTorch Hub

The pretrained model is also available on PyTorch Hub

via TensorFlow or ONNX

See README in the tf subdirectory.

Currently only supports MiDaS v2.1. DPT-based models to be added.

via Mobile (iOS / Android)

See README in the mobile subdirectory.

via ROS1 (Robot Operating System)

See README in the ros subdirectory.

Currently only supports MiDaS v2.1. DPT-based models to be added.

Accuracy

Zero-shot error (the lower - the better) and speed (FPS):

Model	DIW, WHDR	Eth3d, AbsRel	Sintel, AbsRel	Kitti, δ>1.25	NyuDepthV2, δ>1.25	TUM, δ>1.25	Speed, FPS
Small models:							iPhone 11
MiDaS v2 small	0.1248	0.1550	0.3300	21.81	15.73	17.00	0.6
MiDaS v2.1 small URL	0.1344	0.1344	0.3370	29.27	13.43	14.53	30

Big models:							GPU RTX 3090
MiDaS v2 large URL	0.1246	0.1290	0.3270	23.90	9.55	14.29	51
MiDaS v2.1 large URL	0.1295	0.1155	0.3285	16.08	8.71	12.51	51
MiDaS v3.0 DPT-Hybrid URL	0.1106	0.0934	0.2741	11.56	8.69	10.89	46
MiDaS v3.0 DPT-Large URL	0.1082	0.0888	0.2697	8.46	8.32	9.97	47

Citation

Please cite our paper if you use this code or any of the models:

@article{Ranftl2020,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
	year      = {2020},
}

If you use a DPT-based model, please also cite:

@article{Ranftl2021,
	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
	title     = {Vision Transformers for Dense Prediction},
	journal   = {ArXiv preprint},
	year      = {2021},
}

Acknowledgements

Our work builds on and uses code from timm. We'd like to thank the author for making these libraries available.

License

MIT License

Glass-Imaging/MiDaS