BAD SLAM

Overview

BAD SLAM is a real-time approach for Simultaneous Localization and Mapping (SLAM) for RGB-D cameras. Supported platforms are Linux and Windows. The software requires an NVidia graphics card with CUDA compute capability 5.3 or later (however, it would be easy to lower this requirement).

This repository contains the BAD SLAM application and the library it is based on, libvis. The library is work-in-progress and it is not recommended to use it for other projects at this point.

The application and library code is licensed under the BSD license, but please also notice the licenses of the included or externally used third-party components.

If you use the provided code for research, please cite the paper describing the approach:

Thomas Schöps, Torsten Sattler, Marc Pollefeys, "BAD SLAM: Bundle Adjusted Direct RGB-D SLAM", CVPR 2019.

The Windows port and Kinect-for-Azure (K4A) integration has been contributed by Silvano Galliani (Microsoft AI & Vision Zurich).

Screenshots & Videos

Main window	Surfel normals display	Keyframe inspection

Camera requirements

Please keep in mind that BAD SLAM has been designed for high-quality RGB-D videos and is likely to perform badly (no pun intended) on lower-quality RGB-D videos. For more details, see the documentation on camera compatibility.

Pre-built binaries

Windows

For Windows, an executable compiled with Visual Studio 2019 is provided. Please notice that for the moment, this is compiled without K4A. It is also required to download the loop closure resource files as described below in this ReadMe, or loop closures will be disabled. In addition, performing CUDA block-size autotuning as also described below is recommended.

If the executable fails to start due to missing DLLs, try installing the latest Visual C++ redistributable files for Visual Studio 2019.

Linux

For Linux, an AppImage is provided. Please note that it is also required to download the loop closure resource files as described below in this ReadMe, or loop closures will be disabled. In addition, performing CUDA block-size autotuning as also described below is recommended.

In case you encounter an error like

./badslam: relocation error: [...]/libQt5DBus.so.5: symbol dbus_message_get_allow_interactive_authorization, version LIBDBUS_1_3 not defined in file libdbus-1.so.3 with link time reference

then your dbus library is too old. This can be fixed by downloading a recent version and setting LD_LIBRARY_PATH to the directory containing these files before starting the AppImage.

Building

Building has been tested on Ubuntu 14.04 and Ubuntu 18.04 (with gcc), and on Windows (with Visual Studio 2019 and 2017).

The following external dependencies are required.

Dependency	Version(s) known to work
Boost	1.54.0
CUDA	8, 9.1, 10.1
DLib
Eigen	3.3.7
g2o
GLEW
GTest
OpenCV	3.1.0, 3.2.0, 3.4.5, 3.4.6; 4.x does NOT work without changes
OpenGV	in Visual Studio 2017 it compiles only in debug mode
Qt	5.12.0; minimum version: 5.8
SuiteSparse
zlib

Notice that OpenCV is only required as a dependency for loop detection by DLib.

The following external dependencies are optional.

Dependency	Purpose
librealsense2	Live input from RealSense D400 series depth cameras.
k4a & k4arecord	Live input from Azure Kinect cameras.

Build instructions for Linux

Since OpenGV (at the time of writing) always uses the -march=native flag, both BAD SLAM and g2o must use this as well. (For g2o, check for the BUILD_WITH_MARCH_NATIVE CMake option.) If there are inconsistencies, the program may crash when OpenGV or g2o functionality is used (i.e., at loop closures).

After obtaining all dependencies, the application can be built with CMake, for example as follows:

mkdir build_RelWithDebInfo
cd build_RelWithDebInfo
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CUDA_FLAGS="-arch=sm_61" ..
make -j badslam  # Reduce the number of threads if running out of memory, e.g., -j3

Make sure to specify suitable CUDA architecture(s) in CMAKE_CUDA_FLAGS. Common settings would either be the CUDA architecture of your graphics card only (in case you only intend to run the compiled application on the system it was compiled on), or a range of virtual architectures (in case the compiled application is intended for distribution). See the corresponding CUDA documentation.

Optionally, after building, the unit tests can be run, which test some of the bundle adjustment functionality. To do so, build and run the following executable:

make -j badslam_test
./build_RelWithDebInfo/applications/badslam/badslam_test

All tests should pass, unless a default CUDA kernel block size does not work for your GPU. See below for block-size tuning, which however is not picked up by the unit tests at the moment. The application has been tested on GTX 1080 and GTX 1070 GPUs.

Build instructions for Windows

The application can be built by creating a Visual Studio 2019 solution for it with CMake, then compiling the "badslam" project in this solution.

It seemed that a workaround was required to prevent some unresolved external symbols in g2o_csparse_extension (for example, duplicating the problematic functions into g2o_solver_csparse).

Dataset format

For CUDA block-size tuning (see below), at least one dataset should be obtained, even if one intends to run the program with live input.

The program supports datasets in the format of the ETH3D SLAM Benchmark for RGB-D videos. This is an extension of the format introduced by the TUM RGB-D benchmark, containing two small additions:

The original format does not specify the intrinsic camera calibration. BAD SLAM thus additionally expects a file calibration.txt in the dataset directory, consisting of a single line of text structured as follows:
```
fx fy cx cy
```
These values specify the parameters for the pinhole projection (fx * x + cx, fy * y + cy). The coordinate system convention for cx and cy is that the origin (0, 0) of pixel coordinates is at the center of the top-left pixel in the image.
The associate.py tool from the benchmark must be run as follows to associate the color and depth images:
```
python associate.py rgb.txt depth.txt > associated.txt
```

Initial setup

After building the executable and obtaining a dataset, there are two more steps to be done before running the program.

First, the resource files for loop closure handling should be set up (unless the parameter --no_loop_detection is used to disable loop detection). Download the resource files of the DLoopDetector demo. The two relevant files from this archive, brief_k10L6.voc and brief_pattern.yml, must be extracted into a directory named "resources" in the application executable's directory (or an analogous symlink must be created), for example:

- build_RelWithDebInfo
  - applications
    - badslam
      - badslam (executable file)
      - resources
        - brief_k10L6.voc (notice that this is compressed in the archive and needs to be extracted separately)
        - brief_pattern.yml

Second, the CUDA kernel block size auto-tuning should be run. This is not strictly required in case the default sizes work for your GPU, but strongly recommended. This step serves two purposes:

Sometimes, CUDA kernels won't launch with a given thread block size since this would require too many resources. Block size auto-tuning determines and avoids those problematic configurations.
The best block sizes to call CUDA kernels may vary between different graphics cards, and the best way to figure them out is to benchmark it, which the tuning does.

To test your GPU, run the badslam executable with the provided tuning script on any dataset in sequential mode:

python scripts/auto_tune_parameters.py <path_to_badslam_executable> <path_to_dataset> --sequential_ba --sequential_loop_detection

The script will run the program multiple times using different parameters and measure the runtime, i.e., do not run another computing task at the same time to not influence the measurements. It should output a file auto_tuning_result.txt and intermediate files auto_tuning_iteration_X.txt. Move the result file into the resources directory used by BAD SLAM (where the loop detector resources are also stored in). The file will be loaded automatically if it exists in this directory. The intermediate files can be deleted.

Since the program runs multiple times, you may want to limit the number of frames it runs on to speed it up with --end_frame. Also, please notice that tuning data will only be gathered for CUDA kernels that run during the tuning. If later other kernels run during the actual program invocation, they will still use the default block size. So, if for example you want to tune the PCG-related kernels instead of those for alternating optimization, then you need to pass the corresponding parameter --use_pcg in the tuning call. Since the tuning result files are simple plain text files, the results of multiple tuning runs with different parameters could be easily merged to create a tuning file that covers all kernels. Doing this automatically would be a possible future addition to the tuning script.

Running

The simplest way to start the program is without any command-line arguments:

./build_RelWithDebInfo/applications/badslam/badslam

It will show a settings window then that allows to select a dataset or live input, and allows to adjust a variety of parameters.

Alternatively, the program can be run without visualization by specifying all parameters on the command line. If parameters are given on the command line, the visualization can be used with the --gui flag (to start showing the settings window) or the --gui_run flag (to start running immediately).

For example, to immediately start running SLAM on a dataset in the GUI, use:

./build_RelWithDebInfo/applications/badslam/badslam <dataset_path> --gui_run

See the documentation on command line parameters for more details.

The first time the program runs on a dataset, the performance might be limited by the time it takes to read the image files from the hard disk (unless the dataset is on an SSD, or is already cached because the files were written recently). Subsequent runs should be faster as long as the files remain cached.

Please also notice that the real-time mode with parallel odometry and bundle adjustment, despite being the default, was added late in the development process and should be considered potentially unstable (in particular when optimizing the depth camera's deformation, which lacks synchronization for the access to a GPU buffer). Thus, to possibly increase robustness, use the --sequential_ba parameter. Live operation may still be simulated by also specifying --target_frame_rate <desired_fps>.

Extending BAD SLAM

Contributions to this open source project are very welcome. Please try to follow the existing coding style (which is loosely inspired by the Google C++ coding style, but somewhat relaxed in some aspects).

If you are interested in using the direct bundle adjustment component without SLAM, then the intrinsics optimization unit test might be a good starting point, showing how to set up keyframes and perform optimization. It is at applications/badslam/src/badslam/test/test_intrinsics_optimization_[photometric/geometric]_residual.cc.

If you plan to change the cost function used for bundle adjustment, you may want to have a look at scripts/jacobians_derivation.py. This script automatically computes the Jacobians required for optimization from a specification of the residuals in Python. It also outputs somewhat optimized C++ functions to compute the residual, the Jacobian, and both the residual and Jacobian at the same time. The script requires sympy to run. Its main limitation is that it operates on a symbolic representation of the residual (instead of on the algorithm for residual computation, as an autodiff tool would do), which means that its internal residual term may become huge. This may cause excessive runtimes of the script for more complex residuals. You can try removing the simplify() calls in jacobian_functions.py to speed it up, while applying less simplification to the resulting expressions.

Differences to the paper

The open source version of the code has undergone strong refactoring compared to the version used to produce the results in the paper, many new features have been added, and many fixes were done. The photometric residual used for global optimization is slightly different: Instead of using the gradient magnitude as the photometric descriptor, the two components of the gradient are used separately. For these reasons, it should not be expected that the code reproduces the results in the paper exactly, however the results should be similar.

aeolusbot-tommyliu/badslam