Authors: Jun Zhang*, Mina Henein*, Robert Mahony and Viorela Ila (*equally contributed)
VDO-SLAM is a Visual Object-aware Dynamic SLAM library for RGB-D cameras that is able to track dynamic objects, estimate the camera poses along with the static and dynamic structure, the full SE(3) pose change of every rigid object in the scene, extract velocity information, and be demonstrable in real-world outdoor scenarios. We provide examples to run the SLAM system in the KITTI Tracking Dataset, and in the Oxford Multi-motion Dataset.
Click HERE to watch a demo video.
VDO-SLAM is released under a GPLv3 license. For a list of all code/library dependencies (and associated licenses), please see Dependencies.md.
If you use VDO-SLAM in an academic work, please cite:
@article{zhang2020vdoslam,
title={{VDO-SLAM: A Visual Dynamic Object-aware SLAM System}},
author={Zhang, Jun and Henein, Mina and Mahony, Robert and Ila, Viorela},
year={2020},
eprint={2005.11052},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
Related Publications:
- VDO-SLAM: A Visual Dynamic Object-aware SLAM System
Jun Zhang*, Mina Henein*, Robert Mahony and Viorela Ila. ArXiv:2005.11052. [ArXiv/PDF] [Code] [Video] [BibTex] - Robust Ego and Object 6-DoF Motion Estimation and Tracking
Jun Zhang, Mina Henein, Robert Mahony and Viorela Ila. The IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS 2020. [ArXiv/PDF] [BibTex] - Dynamic SLAM: The Need For Speed
Mina Henein, Jun Zhang, Robert Mahony and Viorela Ila. The International Conference on Robotics and Automation. ICRA 2020. [ArXiv/PDF] [BibTex]
We have tested the library in Mac OS X 10.14 and Ubuntu 16.04, but it should be easy to compile in other platforms.
We use some functionalities of c++11, and the tested gcc version is 9.2.1 (ubuntu), the tested clang version is 1000.11.45.5 (Mac).
We use OpenCV to manipulate images and features. Download and install instructions can be found at: http://opencv.org. Required at least 3.0. Tested with OpenCV 3.4.
Required by g2o (see below). Download and install instructions can be found at: http://eigen.tuxfamily.org. Required at least 3.1.0.
We use modified versions of g2o library to perform non-linear optimizations. The modified libraries (which are BSD) are included in the dependencies folder.
For Ubuntu users, a Dockerfile is added for automatically installing all dependencies for reproducible environment, built and tested with KITTI dataset. (Thanks @satyajitghana for the contributions 👍 )
Clone the repository:
git clone https://github.com/halajun/VDO_SLAM.git VDO-SLAM
We provide a script build.sh
to build the dependencies libraries and VDO-SLAM.
Please make sure you have installed all required dependencies (see section 2).
Please also change the library file suffix, i.e., '.dylib' for Mac (default) or '.so' for Ubuntu, in the main CMakeLists.txt.
Then Execute:
cd VDO-SLAM
chmod +x build.sh
./build.sh
This will create
-
libObjSLAM.dylib (Mac) or libObjSLAM.so (Ubuntu) at lib folder,
-
libg2o.dylib (Mac) or libg2o.so (Ubuntu) at /dependencies/g2o/lib folder,
-
and the executable vdo_slam in example folder.
-
Download the demo sequence: kitti_demo, and uncompress it.
-
Execute the following command.
./example/vdo_slam example/kitti-0000-0013.yaml PATH_TO_KITTI_SEQUENCE_DATA_FOLDER
-
Download the demo sequence: omd_demo, and uncompress it.
-
Execute the following command.
./example/vdo_slam example/omd.yaml PATH_TO_OMD_SEQUENCE_DATA_FOLDER
You will need to create a settings (yaml) file with the calibration of your camera. See the settings files provided in the example/ folder. RGB-D input must be synchronized and depth registered. A list of timestamps for the images is needed for input.
The system also requires image pre-processing as input, which includes instance-level semantic segmentation and optical flow estimation. In our experiments, we used Mask R-CNN for instance segmentation (for KITTI only; we applied colour-based method to segment cuboids in OMD, check the matlab code in tools folder), and PWC-NET (PyTorch version) for optic-flow estimation. Other state-of-the-art methods can also be applied instead for better performance.
For evaluation purpose, ground truth data of camera pose and object pose are also needed as input. Details of input format are shown as follows,
-
The input of segmentation mask is saved as matrix, same size as image, in .txt file. Each element of the matrix is integer, with 0 stands for background, and 1,2,...,n stands for different instance label. Note that, to easily compare with ground truth object motion in KITTI dataset, we align the estimated mask label with the ground truth label. The .txt file generation (from .mask) and alignment code is in tools folder.
-
The input of optical flow is the standard .flo file that can be read and processed directly using OpenCV.
- The input of ground truth camera pose is saved as .txt file. Each row is organized as follows,
FrameID R11 R12 R13 t1 R21 R22 R23 t2 R31 R32 R33 t3 0 0 0 1
Here Rij are the coefficients of the camera rotation matrix R and ti are the coefficients of the camera translation vector t.
- The input of ground truth object pose is also saved as .txt file. One example of such file (KITTI Tracking Dataset), which each row is organized as follows,
FrameID ObjectID B1 B2 B3 B4 t1 t2 t3 r1
Where ti are the coefficients of 3D object location t in camera coordinates, and r1 is the Rotation around Y-axis in camera coordinates. B1-4 is 2D bounding box of object in the image, used for visualization. Please refer to the details in KITTI Tracking Dataset if necessary.
The provided object pose format of OMD dataset is axis-angle + translation vector. Please see the provided demos for details. A user can input a custom data format, but need to write a new function to input the data.