This codebase implements the system described in the paper:
Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid
NeurIPS 2019
See the paper on [arXiv] and the [project webpage] for more details.
- A geometry consistency loss for enforcing the scale-consistency of predictions between consecutive frames.
- A self-discovered mask for detecting moving objects and occlusions.
- Enabling the unsupervised estimator (learned from monocular videos) to do visual odometry on a long video.
This codebase was developed and tested with python 3.6, Pytorch 1.0.1, and CUDA 10.0 on Ubuntu 16.04. It is based on Clement Pinard's SfMLearner implementation, in which we make little modification and add our proposed losses.
pip3 install -r requirements.txt
or install manually the following packages :
torch >= 1.0.1
imageio
matplotlib
scipy
argparse
tensorboardX
blessings
progressbar2
path.py
evo
It is also advised to have python3 bindings for opencv for tensorboard visualizations
See "scripts/run_prepare_data.sh" for examples, including KITTI Raw, Cityscapes, and KITTI Odometry.
For KITTI Raw dataset, download the dataset using this script provided on the official website.
For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip
, 2) camera_trainvaltest.zip
. You will probably need to contact the administrators to be able to get it.
For KITTI Odometry dataset download the dataset with color images.
The "scripts" folder provides several examples for training and testing.
You can train the depth model on KITTI Raw by running
sh scripts/train_resnet_256.sh
or train the pose model on KITTI Odometry by running
sh scripts/train_posenet_256.sh
Then you can start a tensorboard
session in this folder by
tensorboard --logdir=checkpoints/
and visualize the training progress by opening https://localhost:6006 on your browser.
You can evaluate depth using Eigen's split by running
sh scripts/run_depth_test.sh
and evaluate visual odometry by running
sh scripts/run_vo_test.sh
Also, you can evaluate 5-frame pose as SfMLearner by running
sh scripts/run_pose_test.sh
Note that depth models are trained on KITTI Raw dataset, and pose models are trained on KITTI Odometry dataset, respectively. They are not coupled.
Models | Abs Rel | Sq Rel | RMSE | RMSE(log) | Acc.1 | Acc.2 | Acc.3 |
---|---|---|---|---|---|---|---|
k_depth | 0.137 | 1.089 | 5.439 | 0.217 | 0.830 | 0.942 | 0.975 |
cs+k_depth | 0.128 | 1.047 | 5.234 | 0.208 | 0.846 | 0.947 | 0.976 |
Models | Seq. 09 | Seq. 10 | |
---|---|---|---|
k_pose | t_err (%) | 11.2 | 10.1 |
r_err (degree/100m) | 3.35 | 4.96 | |
cs+k_pose | t_err (%) | 8.24 | 10.7 |
r_err (degree/100m) | 2.19 | 4.58 |
@inproceedings{bian2019depth,
title={Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video},
author={Bian, Jia-Wang and Li, Zhichao and Wang, Naiyan and Zhan, Huangying and Shen, Chunhua and Cheng, Ming-Ming and Reid, Ian},
booktitle= {Thirty-third Conference on Neural Information Processing Systems (NeurIPS)},
year={2019}
}