Monocular/stereo depth estimation with regression. Trained and evaluated on the raw KITTY dataset. For training, dense depth maps are generated with a Sparse-to-Dense Network (PyTorch implementation). Evaluation is on sparse depth maps (Eigen split). The following models are available: DispNetS, ResNet autoencoder with 18, 34, 50, 101 and 152 layers.
Loss includes:
- regression loss (reversed Huber loss)
- smoothing loss
- occlusion loss
- disparity consistency loss (in stereo mode)
The network was inspired by the following papers:
- Laina et al. "Deeper Depth Prediction with Fully Convolutional Residual Networks" (2016)
- Godard et al. "Unsupervised Monocular Depth Estimation with Left-Right Consistency" (2016)
- Kuznietsov et al. "Semi-Supervised Deep Learning for Monocular Depth Map Prediction" (2017)
- Radwan et al. "VLocNet++ Deep Multitask Learning for Semantic Visual Localization and Odometry" (2018)
- Godard et al. "Digging Into Self-Supervised Monocular Depth Estimation" (2018)
- Yang et al. "Deep Virtual Stereo Odometry Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry" (2018)
- Python 3
- PyTorch 1.0
Training and testing were performed on Ununtu 16.04, Cuda 8.0 and 1080Ti GPU.
- Clone the code
git clone https://github.com/victoriamazo/depth_regression.git
- Download KITTY raw dataset
- Download a pretrained DispNetS model (mono) here
- Download a pretrained ResNet34 model (mono) here
To run demo on a single image (a downloaded model is expected to be in the 'ckpts' directory):
python demo.py ckpts/DispNetS_ckpt_best.pth.tar --arch DispNetS --image images/000011.png
To run the ResNet model, indicate --arch ResNet
and --num_layers 34
.
All configuration parameters are explained in "config/config_params.md".
- Training and testing as parallel threads
python3 main.py config/conv.json
- testing
python3 main.py config/conv.json -m test
- training
python3 main.py config/conv.json -m train
The following results are from evaluation on the raw KITTY dataset (Eigen split):
Method | Abs Rel | Sq Rel | RMSE | RMSE(log) | δ1 | δ2 | δ3 |
---|---|---|---|---|---|---|---|
DispNetS (mono) | 0.1850 | 0.6659 | 2.8280 | 0.2193 | 0.7064 | 0.9566 | 0.9909 |
ResNet34 (mono) | 0.2665 | 1.1139 | 3.6226 | 0.3720 | 0.4883 | 0.8079 | 0.9120 |
Qualitative results:
Input Ground Truth DispnetS
Input Ground Truth ResNet34