monodepth

Tensorflow implementation of unsupervised single image depth prediction using a convolutional neural network.

Unsupervised Monocular Depth Estimation with Left-Right Consistency
Clément Godard, Oisin Mac Aodha and Gabriel J. Brostow
CVPR 2017

For more details:
project page
arXiv

Requirements

This code was tested with Tensorflow 1.0, CUDA 8.0 and Ubuntu 16.04.
Training takes about 30 hours with the default parameters on the kitti split on a single Titan X machine.
You can train on multiple GPUs by setting them with the --num_gpus flag, make sure your batch_size is divisible by num_gpus.

I just want to try it on an image!

There is a simple mode monodepth_simple.py which allows you to quickly run our model on a test image.
Make sure your first download one of the pretrained models in this example we will use model_cityscapes.

python monodepth_simple.py --image_path ~/my_image.jpg --checkpoint_path ~/models/model_cityscapes

Please note that there is NO extension after the checkpoint name

Data

This model requires rectified stereo pairs for training.
There are two main datasets available:

KITTI

We used two different split of the data, kitti and eigen, amounting for respectively 29000 and 22600 training samples, you can find them in the filenames folder.
You can download the entire raw dataset by running:

wget -i utils/kitti_archives_to_download.txt -P ~/my/output/folder/

Warning: it weights about 175GB, make sure you have enough space to unzip too!
To save space you can convert the png images to jpeg.

find ~/my/output/folder/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'

Cityscapes

You will need to register in order to download the data, which already has a train/val/test set with 22973 training images.
We used leftImg8bit_trainvaltest.zip, rightImg8bit_trainvaltest.zip, leftImg8bit_trainextra.zip and rightImg8bit_trainextra.zip which weights 110GB.

Training

Warning: The input sizes need to be mutiples of 128 for vgg or 64 for resnet50 .

The model's dataloader expects a data folder path as well as a list of filenames (relative to the root data folder):

python monodepth_main.py --mode train --model_name my_model --data_path ~/data/KITTI/ \
--filenames_file ~/code/monodepth/utils/filenames/kitti_train_files.txt --log_directory ~/tmp/

You can continue training by loading the last saved checkpoint using --checkpoint_path and pointing to it:

python monodepth_main.py --mode train --model_name my_model --data_path ~/data/KITTI/ \
--filenames_file ~/code/monodepth/utils/filenames/kitti_train_files.txt --log_directory ~/tmp/ \
--checkpoint_path ~/tmp/my_model/model-50000

You can also fine-tune from a checkpoint using --retrain.
You can monitor the learning process using tensorboard and pointing it to your chosen log_directory.
By default the model only saves a reduced summary to save disk space, you can disable this using --full_summary.
Please look at the main file for all the available options.

Testing

To test change the --mode flag to test, the network will output the disparities in the model folder or in any other folder you specify wiht --output_directory.
You will also need to load the checkpoint you want to test on, this can be done with --checkpoint_path:

python monodepth_main.py --mode test --data_path ~/data/KITTI/ \
--filenames_file ~/code/monodepth/utils/filenames/kitti_stereo_2015_test_files.txt --log_directory ~/tmp/ \
--checkpoint_path ~/tmp/my_model/model-181250

Please note that there is NO extension after the checkpoint name
If your test filenames contain two files per line the model will ignore the second one, unless you use the --do_stereo flag. The network will output two files disparities.npy and disparities_pp.npy, respecively for raw and post-processed disparities.

Evaluation on KITTI

To evaluate run:

python utils/evaluate_kitti.py --split kitti --predicted_disp_path ~/tmp/my_model/disparities.npy \
--gt_path ~/data/KITTI/

The --split flag allows you to choose which dataset you want to test on.

kitti corresponds to the 200 official training set pairs from KITTI stereo 2015.
eigen corresponds to the 697 test images used by Eigen NIPS14 and uses the raw LIDAR points.

Warning: The results on the Eigen split are usually cropped, which you can do by passing the --garg_crop flag.

Models

You can download our pre-trained models to an existing directory by running:

sh ./utils/get_model.sh model_name output_directory

All our models were trained for 50 epochs, 512x256 resolution and a batch size of 8, please see our paper for more details.
We converted KITTI and Cityscapes to jpeg before training.
Here are all the models available:

model_kitti: Our main model trained on the kitti split
model_eigen: Our main model trained on the eigen split
model_cityscapes: Our main model trained on cityscapes
model_city2kitti: model_cityscapes fine-tuned on kitti
model_city2eigen: model_cityscapes fine-tuned on eigen
model_kitti_stereo: Our stereo model trained on the kitti split for 12 epochs, make sure to use --do_stereo when using it

All our models, except for stereo, have a Resnet50 variant which you can get by adding _resnet to the model name.
To test or train using these variants, you need to use the flag --encoder resnet50.

Results

You can download our results (unscaled disparities at 512x256) on both KITTI splits (kitti and eigen) here.
The naming convention is the same as with the models.

Reference

If you find our work useful in your research please consider citing our paper:

@inproceedings{monodepth17,
  title     = {Unsupervised Monocular Depth Estimation with Left-Right Consistency},
  author    = {Cl{\'{e}}ment Godard and
               Oisin {Mac Aodha} and
               Gabriel J. Brostow},
  booktitle = {CVPR},
  year = {2017}
}

Video

License

This Software is licensed under the terms of the UCLB ACP-A Licence which allows for non-commercial use only, the full terms of which are made available in the LICENSE file. For any other use of the software not covered by the terms of this licence, please contact info@uclb.com

kimyoungji/monodepth