Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

This codebase implements the system described in the paper:

UnOS: Unified Unsupervised Optical-flow and Stereo-depth Estimation by Watching Videos

More information can also be found in an earlier version of the paper Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Yang Wang, Peng Wang, Zhenheng Yang, Yi Yang, Chenxu Luo and Wei Xu

Please contact Yang Wang (wangyangcharles@gmail.com) if you have any questions.

Prerequisites

This codebase was developed and tested with Tensorflow 1.2, CUDA 8.0 and Ubuntu 14.04.

Acknowledgement

Some of the codes were borrowed from the excellent works of Tinghui Zhou, Clément Godard, Huangying Zhan, Chelsea Finn, Ruoteng Li and Martin Kersner. The files borrowed from Clément Godard and Huangying Zhan are licensed under their original license respectively.

Preparing training data

You would need to download all of the KITTI raw data and calibration files to train the model. You would also need the training files of KITTI 2012 and KITTI 2015 for validating the models.

Pretrained models

The pretrained models described in the paper can be downloaded here.

Training

As described in the paper, the training are organized into three stages sequentially.

Stage 1: only train optical flow

In Stage 1, we only train the PWC-Flow net to learn the optical flow.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4 --mode=flow --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --trace=/path/to/store-your-model-and-logs

After around 200K iterations, you should be able to reach the performance of Ours(PWC-Only) described in the paper. You can also download our pretrained model model-flow.

Stage 2: only train depth and pose

In Stage 2, we train the PWC-Disp and MotionNet to learn the depth and pose.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4  --mode=depth --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --pretrained_model=/path/to/your/pretrained-flow-model-in-stage1  --trace=/path/to/store-your-model-and-logs

After around 200K iterations, you sould be able to reach the performance of Ours(Ego-motion) described in the paper. You can also download our pretrained model model-depth

Stage 3: train optical flow, depth, pose and motion segmentation together

In Stage 3, we train everything together.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4  --mode=depthflow --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --pretrained_model=/path/to/your/pretrained-depth-model-in-stage2 --trace=/path/to/store-your-model-and-logs

After around 200K iterations, you should be able to reach the performance of Ours(Full) described in the paper. You can also download our pretrained model model-depthflow

Aside: Only train depth using stereo

If you would like to only train depth using the stereo pairs, you can run the following script. This is different from Stage 2 that it only trains PWC-Disp net using stereo pairs.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4 --mode=stereo --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --trace=/path/to/store-your-model-and-logs

After around 100K iterations, you should be able to reach the performance of Ours(Stereo-only) described in the paper. You can also download our pretrained model model-stereo

Notes

You can specify multiple GPUs training with flag --num_gpus
You can switch to KITTI odometry split by setting --train_file=./filenames/odo_train_files_png_4frames.txt
If you would like to continue to train a model from a previous checkpoint from the same mode, you can set --retrain=False

Evaluation

The evaluation has already been performed while doing the training. The evaluation results will be printed to the screen.

If you would like to only do a evaluation run, you can set --train_test=test.

You can test the pose estimations on sequences 09 and 10 by setting --eval_pose=09,10 which only works for modes depth and depthflow.

Disclaimer

This is the authors' implementation of the system described in the paper and not an official Baidu product.

baidu-research/UnDepthflow