This repo is reference PyTorch implementation for training and testing depth estimation models using the method described in
Deep Digging into the Generalization of Self-supervised Monocular Depth Estimation
Jinwoo Bae, Sungho Moon and Sunghoon Im
AAAI 2023 (arxiv)
Our code is based on PackNet-SfM of TRI.
If you find our work useful in your research, please consider citing our papers :
@inproceedings{bae2022monoformer,
title={Deep Digging into the Generalization of Self-supervised Monocular Depth Estimation},
author={Bae, Jinwoo and Moon, Sungho and Im, Sunghoon},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2023}
}
conda create -n monoformer python=3.7
git clone https://github.com/sjg02122/MonoFormer.git
pip install -r requirements.txt
We ran our experiments with PyTorch 1.10.0+cu113, Python 3.7, A6000 GPU and Ubuntu 20.04.
We experiment extensively on modern backbone architectures (e.g., ConvNeXt, RegionViT). MF means MonoFormer.
Model | Abs Rel | Sq Rel | RMSE | a1 |
---|---|---|---|---|
MF-hybrid | 0.104 | 0.846 | 4.580 | 0.891 |
MF-ViT | 0.118 | 0.942 | 4.840 | 0.873 |
MF-Twins | 0.125 | 1.309 | 4.973 | 0.866 |
MF-RegionViT | 0.113 | 0.893 | 4.756 | 0.875 |
MF-ConvNeXt | 0.111 | 0.760 | 4.533 | 0.878 |
MF-SLaK | 0.117 | 0.866 | 4.811 | 0.878 |
You configure your datasets in config.py or other config yaml files. (DATA_PATH means your data root path.). In our experiments, we only use the KITTI datasets for training. Other datasets (e.g., ETH3D, DeMoN, and etc.) is used for testing.
The KITTI (raw) datasets can be downloaded from the KITTI website. If you want to download the datasets using command, please use the command of PackNet-SfM.
You can download the texture-shift datasets (Water, Pencil and Style-transfered)
In our experiments, we use the ETH3D, DeMoN (e.g., MVS, SUN3D, RGBD, Scenes11) and our generated texture-shifted datasets.
It will be updated soon.
You can directly run inference on a single image or folder:
python3 scripts/infer.py --checkpoint <checkpoint.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]
You can also evaluate the model using:
python3 scripts/eval.py --checkpoint <checkpoint.ckpt> [--config <config.yaml>]
Our training is similar to PackNet-SfM. Any training, including fine-tuning, can be done by passing either a .yaml config file or a .ckpt model checkpoint to scripts/train.py:
python3 scripts/train.py <config.yaml or checkpoint.ckpt>