
The repository is to build a fair environment where the Self-supervised Monocular Depth Estimation (SMDE) methods could be evaluated and developed.

Self-supervised Monocular Depth Estimation with Pytorch

Welcome to V2.0

In V2.0, you can compute the FLOPs (supported by thop) and infrerence speeds simply. We also supports more flexible traning configs such as dividing one training iteration in multiple steps and setting different loss fuctions for different parameters (e.g. used in TiO-Depth). We have tried our best to update all the methods in V1.0 to V2.0 and we holp it would be helpful. BTW, our new method TiO-Depth was accpted to ICCV 2023 !! and it was incloud in this repo.

About SMDE-Pytorch

We build this repository with Pytorch for evaluating and developing the Self-supervised Monocular Depth Estimation (SMDE) methods. The main targets of the SMDE-Pytorch are:

  • Predict depths with typical SMDE methods (with their pretrained models) with simple commands.
  • Evaluate the performances (including the FLOPs and speed) of the SMDE methods more fairly.
  • Train and modify the existing SMDE methods simply.
  • Develop your methods quickly with the modular network parts.

If you have any questions or suggestions, please make an issue or contact us by zm_zhou1998@163.com (Maybe I couldn't reply soon due to work.). If you like the work and click the Star, we will be happy~


We built and tested the repository with Ubuntu 18.04, CUDA 11.0, Python 3.7.9, and Pytorch 1.7.0. For using this repository, we recommend creating a virtual environment by Anaconda. Please open a terminal in the root of the repository folder for running the following commands and scripts.

conda env create -f environment.yml
conda activate pytorch170cu11

Method Zoo

Method Ref. Test Train Paper Code
Monodepth2 2019 ICCV Link Link
DepthHints 2019 ICCV Link Link
EdgeOfDepth 2020 CVPR Link Link
PackNet 2020 CVPR Link Link
P2Net 2020 ECCV Link Link
FAL-Net 2020 NeurIPS Link Link
HRDepth 2021 AAAI Link Link
DIFFNet 2021 BMCV Link Link
ManyDepth 2021 CVPR Link Link
EPCDepth 2021 ICCV Link Link
FSRE-Depth 2021 ICCV Link Link
R-MSFM 2021 ICCV Link Link
OCFD-Net (Ours) 2022 ACM-MM' Link Link
SDFA-Net (Ours) 2022 ECCV Link Link
TiO-Depth (Ours) 2023 ICCV Link Link
  • Test : You could predict depths with their pretrained models provided by their official implementations. We have tested their performances and more details are given on their pages (click their names in the table).
  • Train: We have trained the method with this repository and the trained model achieves competitive or better performances compared to the official version.


  • SuperDepth (ICRA 2019)

Evaluation Results

We give the performances of the methods on the KITTI raw test set (an outdoor dataset) for helping you choose the model. More pretrained models are given on their pages (click their names in the above table).

Method Info. Sup Trained Abs Rel. Sq Rel. RMSE RMSElog A1
ManyDepth(Mono) Res18+192x640 Mono Offical 0.118 0.891 4.763 0.192 0.871
PackNet PackV1+192x640 Mono Official 0.110 0.836 4.655 0.187 0.881
R-MSFM6 Res18+192x640 Mono Trained 0.110 0.797 4.646 0.188 0.880
Monodepth2 Res18+320x1024 Mono Trained 0.109 0.797 4.533 0.184 0.888
FSRE-Depth Res18+192x640 Mono Trained 0.107 0.751 4.525 0.182 0.886
Monodepth2 Res18+320x1024 Stereo Trained 0.104 0.824 4.747 0.200 0.875
HRDepth Res18+384x1280 Mono Trained 0.102 0.719 4.396 0.178 0.897
FAL-NetB N=49+375x1242 Stereo Trained 0.099 0.625 4.197 0.182 0.885
DIFFNet HR18+320x1024 Mono Trained 0.099 0.688 4.345 0.176 0.901
DepthHints Res50+320x1024 Stereo Trained 0.094 0.680 4.333 0.181 0.894
EdgeOfDepth Res50+320x1024 Stereo Official 0.092 0.647 4.247 0.177 0.897
OCFD-Net Res50+384x1280 Stereo Trained 0.091 0.576 4.036 0.174 0.901
EPCDepth Res50+320x1024 Stereo Trained 0.090 0.682 4.282 0.178 0.903
SDFA-Net SwinT*+384x1280 Stereo Trained 0.089 0.537 3.895 0.169 0.906
TiO-Depth SwinT*+384x1280 Stereo Trained 0.085 0.544 3.919 0.169 0.911

The methods on the NYU v2 test set (an indoor dataset).

Method Info. Sup Trained Abs Rel. RMSE log10 A1
P2Net Res18+5f+288x384 Mono Official 0.149 0.556 0.063 0.797
  • Official means that the results are predicted with the models got from their Official Implementations.
  • Trained means that the results are predicted with the models trained with this repository.
  • code for all the download links is smde

Predict depth for your image(s) straightforwardly

To predict depth maps for your images, please firstly download the pretrained model that you are interested in from the column named Trained in the above table. After unzipping the downloaded model, you could predict the depth maps for your images by

python predict.py\
 --image_path <path to your image or folder name for your images>\
 --exp_opts <path to the method training option>\
 --model_path <path to the downloaded or trained model>

You also could set --input_size to decide the size that the images are reshaped before they are input to the model. If you want to predict on CPU, please set --cpu. The depth results <image name>_pred.npy and the visualization results <image name>_visual.png will be saved in the same folder as the input images.

For example, if you want to predict depths from the images in ./example_images with Monodepth2 (using the model that was saved in pretrained_models/MD2_S_320_bs4/model/best_model.pth), you could use:

python predict.py\
 --image_path example_images\
 --exp_opts options/Monodepth2/train/monodepth2-res18_320_kitti_stereo.yaml\
 --model_path pretrained_models/MD2_M_320_bs4/model/best_model.pth

For the methods which could not be trained in the repository yet, you could use the options in options/_base/network for --exp_opts. Specifically, you could use the following command for predicting the images with PackNet and the pretrained model saved in pretrained_models/PackNet_M_192_OI/model/PackNet_M_192.pth.

python predict.py\
 --image_path example_images\
 --exp_opts options/_base/networks/packnet.yaml\
 --model_path pretrained_models/PackNet_M_192_OI/model/PackNet_M_192.pth\

Since the default image size in options/_base/networks/packnet.yaml is 192x640, when you want to use the model trained under 384x1280, you could use:

python predict.py\
 --image_path example_images\
 --exp_opts options/_base/networks/packnet.yaml\
 --model_path pretrained_models/PackNet_Mv_CS+K_384_OI/model/PackNet_Mv_CS+K_384.pth\
 --input_size 384 1280

Prepare datasets

Before evaluating or training the methods, you should download the used datasets. The datasets that could be used for training or evaluating:

Dataset Train Test
KITTI ✔ (175GB) ✔ (2GB)
NYU v2 ✔ (2GB)
Mak3D ✔ (200MB)
Cityscapes ✔ (130GB) ✔ (35GB)
KITTI Stereo 2015 ✔ (2GB)
Set data path

We give an example path_example.py for setting the path in the repository. Please create a python file named path_my.py and copy the code in path_example.py to the path_my.py. Then you can replace the used paths to your folder in the path_my.py. the folder for each dataset should be organized like:

<root of kitti>
|   |---2011_09_26_drive_0001_sync
|   |   |---image_02
|   |   |---image_03
|   |   |---velodyne_points
|   |   |---...
|   |---2011_09_26_drive_0002_sync
|   |   |---image_02
|   |   |---image_03
|   |   |---velodyne_points
|   |   |---...
|   '''
|   |--- ...
|---gt_depths_raw.npz (for raw Eigen test set)
|---gt_depths_improved.npz (for improved Eigen test set)
<root of NYU v2 (just test set)>
<root of Make3D>
|   |---depth_sph_corr-10.21op2-p-015t000.mat
|   |---depth_sph_corr-10.21op2-p-139t000.mat
|   |---...
|   |---img-10.21op2-p-015t000.jpg
|   |---img-10.21op2-p-139t000.jpg
|   |---...
<root of cityscapes>
|   |---train
|   |   |---aachen
|   |   |   |---aachen_000000_000019_leftImg8bit.png
|   |   |   |---aachen_000001_000019_leftImg8bit.png
|   |   |   |---...
|   |   |---bochum
|   |   |---...
|   |---train_extra
|   |   |---augsburg
|   |   |---...
|   |---test
|   |   |---...
|   |---val
|   |   |---...
|   |--- ...
|   |--- ...
|   |--- ...
|---gt_depths (for evaluation)
|   |---000_depth.npy
|   |---001_depth.npy
|   |--- ...
<root of kitti 2015>
|   |---image_2
|   |   |---000000_10.png
|   |   |---000000_11.png
|   |   |---000001_10.png
|   |   |---...
|   |---image_3
|   |   |---000000_10.png
|   |   |---000000_11.png
|   |   |---000001_10.png
|   |   |---...
|   |---disp_occ_0
|   |   |---000000_10.png
|   |   |---000000_11.png
|   |   |---000001_10.png
|   '''
|   |--- ...

For training the methods on the KITTI dataset (the Eigen split), you should download the entire KITTI dataset (about 175GB) by:

wget -i ./datasets/kitti_archives_to_download.txt -P <save path>

And you could unzip them with:

cd <save path>
unzip "*.zip"

For evaluating the methods on the KITTI (Eigen raw test set), you should further generate the ground-truth depth file by (as done in the Monodepth2):

python datasets/utils/export_kitti_gt_depth.py --data_path <root of KITTI> --split raw

If you want to evaluate the method on the KITTI improved test set, you should download the annotated depth maps (about 15GB) at Here and unzip it. Then you could generate the imporved ground-truth depth file by:

python datasets/utils/export_kitti_gt_depth.py --data_path <root of KITTI> --split improved

As an alternative, we provide the Eigen test subset (with .png images Here or with .jpg images Here, about 2GB) and the generated gt_depth files for the people who just want to do the evaluation.


We use the NYUv2 test set as done in P2Net and EPCDepth, which could be downloaded in Here


We use the Make3D test set for evaluating some methods, which could be downloaded in Here


Cityscapes could be used to jointly train the model with KITTI, which is helpful to improve the performance of the model. If you want to use the Cityscapes, please download the following parts of the dataset at Here and unzip them to your <root of cityscapes> (Note: For some files, you should apply for download permission by email.):

leftImg8bit_trainvaltest.zip (11GB)  <- If just do the evluation, download this
leftImg8bit_trainextra.zip (44GB)
rightImg8bit_trainvaltest.zip (11GB)
rightImg8bit_trainextra.zip (44GB)
disparity_trainvaltest.zip (3.5GB)
disparity_trainextra.zip (15GB)
camera_trainvaltest.zip (2MB)  <- If just do the evluation, download this
camera_trainextra.zip (8MB)

Then, please generate the camera parameter matrices by:

python datasets/utils/export_cityscapes_matrix.py

You also need to download the prepared ground-truth depth Here which is provided by Watson in ManyDepth.

KITTI Stereo 2015

For evaluating the model on the KITTI Stereo 2015 training set as many stereo matching methods, you should download the corresponding dataset Here and unzip it. It is noted that the training of the model requires the entire KITTI dataset.

Evaluate the methods

To evaluate the methods on the prepared dataset, you could simply use

python evaluate.py\
 --exp_opts <path to the method EVALUATION option>\
 --model_path <path to the downloaded or trained model>

We provide the EVALUATION option files in options/<Method Name>/eval/*. Here we introduce some important arguments.

Argument Information
--metric_name depth_kitti_mono Enable the median scaling for the methods traind with monocular sequences (Sup = Mono)
--visual_list The samples which you want to save the output (path to a .txt file)
--save_pred Save the predicted depths of the samples which are in --visual_list
--save_visual Save the visualization results of the samples which are in --visual_list
-fpp,-gpp, -mspp Adopt different post-processing steps. (Please choose one in each time)

The output files are saved in eval_res\ by default. Please check evaluate.py for more information about arguments.

For example, if you want to evaluate Monodepth2 on the KITTI Eigen test set with the post-processing proposed by Godard, and you want to save the visualization and predicted depths of all the test samples. Please use:

python evaluate.py\
 --exp_opts options/Monodepth2/eval/monodepth2-res18-stereo_320_kitti.yaml\
 --model_path pretrained_models/MD2_S_320_bs4/model/best_model.pth\
 --visual_list data_splits/kitti/test_list.txt

The evaluation output will be like

->Load the test dataset
->Load the pretrained model
->Use the post processing
->Start Evaluation
    | abs_rel  |  sq_rel  |   rms    | log_rms  |    a1    |    a2    |    a3    |
    |     0.102|     0.795|     4.685|     0.198|     0.876|     0.954|     0.977|

The output predicted depths and visualization results will be saved in eval_res/MD2_S_320_bs4/-gpp/*.

Train the methods

To train (reproduce) the methods on the prepared dataset, you could simply use the commands provided in options/<Method Name>/train/train_scripts.sh.

For example, if you want to train Monodepth2 on the KITTI dataset with stereo image pairs, please use:

 --name MD2-Res50_192_B12_S\
 --exp_opts options/Monodepth2/train/monodepth2-res18_192_kitti_stereo.yaml\
 --batch_size 12\
 --beta1 0.9\
 --epoch 20\
 --decay_step 15\
 --decay_rate 0.1\
 --save_freq 10\
 --visual_freq 2000

Modify the methods

coming soon


