This is a dense SLAM system written in C++. It builds on InfiniTAM, adding support for stereo input and separate dynamic object (e.g., car) reconstruction.
Currently under development as my Master's Thesis, as part of the Computer Vision and Geometry Group of ETH Zurich.
The following screenshot shows an early preview of DynSLAM in action. It takes in stereo input, computes the depth map, using either ELAS or dispnet, segments the input RGB using Multi-task Network Cascades to detect object instances, and then separately reconstructs the static background and individual object instances.
The top pane shows the dense reconstruction of the background. The following panes show, in top-down, left-right order: the left RGB frame, the computed depth map, the output of the instance-aware semantic segmentation algorithm, the input RGB to the instance reconstructor, memory usage statistics, and a novel view of the reconstructed object instance.
The colors in the 3D reconstructions correspond to the voxel weights: red-tinted areas are low-weight ones, whereas blue ones are high-weight ones. Areas which remain low-weight even several frames after first being observed are very likely to be noisy, while blue ones are ones where the system is confident in its reconstruction.
- My InfiniTAM fork, which is used by this system for the actual 3D reconstruction (via volumetric fusion, using voxel hashing for map storage). My fork contains a series of small tweaks designe to make InfiniTAM a little easier to use as a component of a larger system.
- My fork of the official implemntation of Multi-task Network Cascades for image semantic segmentation. We need this for identifying where the cars are in the input videos. Using semantics enables us to detect both moving and static cars.
- My fork of the modified Caffe used by MNC. Since MNC's architecture requires some tweaks to Caffe's internals, its authors forked Caffe and modified it to their needs. I forked their fork and made it work with my tools, while also making it faster by merging it with the Caffe master, which enabled cuDNN 5 support, among many other things.
- My mirror of libelas which I use for pre-computing the depth maps. I'm working on getting the depth computation to happen on the fly, and investigating other methods for estimating depth from stereo.
Coming soon! Right now the system is a bit tangled up, so it can't be run out of the box without jumping through a lot of hoops. This will change over the course of the next few months.
Important: if you're interested in this project and it's after September 1st 2017, please email me! My email is on my GitHub profile page. I will update the instructions accordingly. Reproducibility is VERY important to me.
Note that the system is under heavy development at the moment, so that these
instructions could quickly go out of date. Generally speaking, this project is
built using CMake, and it depends on several submodules. As such, make sure you
don't forget the --recursive
flag when cloning the repository. If you did
forget it, just run git submodule update --init --recursive
- Clone the repository if you haven't already:
git clone --recursive
Install OpenCV 2.4.9 and CUDA (no special version requirements at the moment).
Install the remaining prerequisites (Ubuntu example):
sudo apt-get install libxmu-dev libxi-dev freeglut3 freeglut3-dev glew-utils libglew-dev libglew-dbg
- Build the project in the standard CMake fashion:
mkdir build && cd build && cmake .. && make -j
Grab any raw KITTI data sequence from the official website. Make sure it's a synced+rectified sequence.
Use the MNC pre-trained neural network to process the KITTI sequence. In the future, this will be integrated into the main pipeline but right now Caffe is a bit capricious.
(TODO; right now many things are hard-coded, sorry :( ) Run the pipeline on the KITTI sequence you downloaded.
./DynSLAM path/to/kitti/sequence
The code follows Google's C++ style guide with the following modifications:
- The column limit is 100 instead of 80, because of the bias towards longer type names in the code base.
- Exceptions are allowed, but must be used judiciously (i.e., for serious errors and exceptional situations).