/open_vins

[TUD AE4270] SuperPoint & SuperGlue on OpenVINS

Primary LanguageC++GNU General Public License v3.0GPL-3.0

OpenVINS * (SuperPoint + SuperGlue)

This is a small course work project for AE4270 Control & Operations Project at MAVLab, TU Delft.

It features SuperPoint and SuperGlue in the tracking frontend. Currently only the monocular tracking frontend is supported and tested.

Main Results

[Youtube] [More results and analysis] [Report]

The application of SuperPoint and SuperGlue in the front-end of OpenVINS for feature tracking may not be the most effective approach:

  • Addressing the issue of tracked key points being lost proves challenging, as SuperGlue lacks the granularity for per-feature matching.
  • Even with TensorRT optimization, executing two deep neural networks remains resource-intensive for robots with constrained computational resources. Furthermore, there is no discernible improvement in estimation accuracy, even on difficult sequences with motion blur in the frames.
  • While SuperPoint and SuperGlue may enhance the front-end's robustness to illumination changes, it's worth noting that the IMU already partially compensates for degraded visual perception.
  • SuperPoint and SuperGlue excel in wide baseline pose estimation tasks, but for frame-to-frame tracking, their advantages may not be as pronounced.

To fully leverage the potential of SuperPoint and SuperGlue in the visual front-end of a modern pose estimation system, it may be more advantageous to use ORB-SLAM as the foundational codebase. This approach, although requiring the development of new visual vocabularies and dealing with a more complex code structure, could be more effective. ORB-SLAM employs a more intricate algorithm for handling key points and associating them with map points or landmarks. Additionally, it appears to rely more on the frame-to-model matching strategy, which mitigates the risk of losing track of stable key points and may better exploit the properties of SuperPoint and SuperGlue. By using SuperPoint and SuperGlue, ORB-SLAM3's issues on the TUM-RGBD sequences may be solved.

Env Setup & Compilation

You can test this project inside a virtual environment (Miniforge) without messing up your system installs. Of course you can use the system installation of ROS. Noetic is officially recommended.

mamba create -n openvins python=3.9 -y
mamba activate openvins

mamba install ros-noetic-desktop ros-noetic-image-transport-plugins -c robostack-staging -y
mamba deactivate
mamba activate openvins

mamba install compilers cmake pkg-config make ninja colcon-common-extensions catkin_tools ipykernel matplotlib numpy ipykernel ceres-solver=2.1.0 gcc=11 -y

# pytorch-cuda=11.8 cuda-toolkit=11.8 cuda-nvcc=11.8 are for libtorch
# they are not needed to use with the python wrapper version of SuperGlueCpp
mamba install pytorch=2.0.1 torchvision pytorch-cuda=11.8 cuda-toolkit=11.8 cuda-nvcc=11.8 -c pytorch -c nvidia -y
# use older version of mkl (https://github.com/pytorch/pytorch/issues/123097)
mamba install mkl=2024.0.0 -y

Clone me recursively.

mkdir -p catkin_ov/src
cd catkin_ov/src/
git clone https://github.com/ErcBunny/open_vins.git --recursive

Build SuperGlueCpp project as a 3rd party library. Env openvins can be directly used for building. For using the wrapper library (not libtorch) version, run the following.

cd SuperGlueCpp
mkdir build
cd build
cmake ..
make

And you will see a shared library file in SuperGlueCpp/lib/. Then build the ROS workspace.

conda activate openvins
cd catkin_ov/
catkin_make

Run Me

We test our implementation on the euroc mav dataset. GPU inference cannot be enabled somehow using "Python Wrapper" style implementation, so this project only supports running in the serial mode for now.

To reproduce the main results, first create a soft link to the dataset folder in ov_msckf/scripts/ so that ov_msckf/scripts/datasets/ contains directory euroc_mav, where bag files are stored.

cd ov_msckf/scripts/
ln -s ${ABS_PATH_TO_data_folder} datasets

Now run the following scripts. Results will be saved to ov_msckf/scripts/runs/.

conda activate openvins
cd ov_msckf/scripts/
# KLT
./run_ros_eth_VR_spsg_klt.sh
./run_ros_eth_MH_spsg_klt.sh
# ORB
./run_ros_eth_VR_spsg_orb.sh
./run_ros_eth_MH_spsg_orb.sh
# SuperPoint + SuperGlue
./run_ros_eth_VR_spsg_nn.sh
./run_ros_eth_MH_spsg_nn.sh

In ov_msckf/scripts/runs_oct4_23/ you can find an example of results (several files are renamed) which are obtained using Ubuntu 22 on a CPU 13900K.

Scripts for Eval

conda activate openvins
cd ov_msckf/scripts/runs_oct4_23/
./eval_err.sh
./eval_timing.sh

Original README begins...

OpenVINS

ROS 1 Workflow ROS 2 Workflow ROS Free Workflow

Welcome to the OpenVINS project! The OpenVINS project houses some core computer vision code along with a state-of-the art filter-based visual-inertial estimator. The core filter is an Extended Kalman filter which fuses inertial information with sparse visual feature tracks. These visual feature tracks are fused leveraging the Multi-State Constraint Kalman Filter (MSCKF) sliding window formulation which allows for 3D features to update the state estimate without directly estimating the feature states in the filter. Inspired by graph-based optimization systems, the included filter has modularity allowing for convenient covariance management with a proper type-based state system. Please take a look at the feature list below for full details on what the system supports.

News / Events

  • May 11, 2023 - Inertial intrinsic support released as part of v2.7 along with a few bug fixes and improvements to stereo KLT tracking. Please check out the release page for details.
  • April 15, 2023 - Minor update to v2.6.3 to support incremental feature triangulation of active features for downstream applications, faster zero-velocity update, small bug fixes, some example realsense configurations, and cached fast state prediction. Please check out the release page for details.
  • April 3, 2023 - We have released a monocular plane-aided VINS, termed ov_plane, which leverages the OpenVINS project. Both now support the released Indoor AR Table dataset.
  • July 14, 2022 - Improved feature extraction logic for >100hz tracking, some bug fixes and updated scripts. See v2.6.1 PR#259 and v2.6.2 PR#264.
  • March 14, 2022 - Initial dynamic initialization open sourcing, asynchronous subscription to inertial readings and publishing of odometry, support for lower frequency feature tracking. See v2.6 PR#232 for details.
  • December 13, 2021 - New YAML configuration system, ROS2 support, Docker images, robust static initialization based on disparity, internal logging system to reduce verbosity, image transport publishers, dynamic number of features support, and other small fixes. See v2.5 PR#209 for details.
  • July 19, 2021 - Camera classes, masking support, alignment utility, and other small fixes. See v2.4 PR#117 for details.
  • December 1, 2020 - Released improved memory management, active feature pointcloud publishing, limiting number of features in update to bound compute, and other small fixes. See v2.3 PR#117 for details.
  • November 18, 2020 - Released groundtruth generation utility package, vicon2gt to enable creation of groundtruth trajectories in a motion capture room for evaulating VIO methods.
  • July 7, 2020 - Released zero velocity update for vehicle applications and direct initialization when standing still. See PR#79 for details.
  • May 18, 2020 - Released secondary pose graph example repository ov_secondary based on VINS-Fusion. OpenVINS now publishes marginalized feature track, feature 3d position, and first camera intrinsics and extrinsics. See PR#66 for details and discussion.
  • April 3, 2020 - Released v2.0 update to the codebase with some key refactoring, ros-free building, improved dataset support, and single inverse depth feature representation. Please check out the release page for details.
  • January 21, 2020 - Our paper has been accepted for presentation in ICRA 2020. We look forward to seeing everybody there! We have also added links to a few videos of the system running on different datasets.
  • October 23, 2019 - OpenVINS placed first in the IROS 2019 FPV Drone Racing VIO Competition . We will be giving a short presentation at the workshop at 12:45pm in Macau on November 8th.
  • October 1, 2019 - We will be presenting at the Visual-Inertial Navigation: Challenges and Applications workshop at IROS 2019. The submitted workshop paper can be found at this link.
  • August 21, 2019 - Open sourced ov_maplab for interfacing OpenVINS with the maplab library.
  • August 15, 2019 - Initial release of OpenVINS repository and documentation website!

Project Features

  • Sliding window visual-inertial MSCKF
  • Modular covariance type system
  • Comprehensive documentation and derivations
  • Extendable visual-inertial simulator
    • On manifold SE(3) b-spline
    • Arbitrary number of cameras
    • Arbitrary sensor rate
    • Automatic feature generation
  • Five different feature representations
    1. Global XYZ
    2. Global inverse depth
    3. Anchored XYZ
    4. Anchored inverse depth
    5. Anchored MSCKF inverse depth
    6. Anchored single inverse depth
  • Calibration of sensor intrinsics and extrinsics
    • Camera to IMU transform
    • Camera to IMU time offset
    • Camera intrinsics
    • Inertial intrinsics (including g-sensitivity)
  • Environmental SLAM feature
    • OpenCV ARUCO tag SLAM features
    • Sparse feature SLAM features
  • Visual tracking support
    • Monocular camera
    • Stereo camera (synchronized)
    • Binocular cameras (synchronized)
    • KLT or descriptor based
    • Masked tracking
  • Static and dynamic state initialization
  • Zero velocity detection and updates
  • Out of the box evaluation on EuRocMav, TUM-VI, UZH-FPV, KAIST Urban and other VIO datasets
  • Extensive evaluation suite (ATE, RPE, NEES, RMSE, etc..)

Codebase Extensions

  • ov_plane - A real-time monocular visual-inertial odometry (VIO) system which leverages environmental planes. At the core it presents an efficient robust monocular-based plane detection algorithm which does not require additional sensing modalities such as a stereo, depth camera or neural network. The plane detection and tracking algorithm enables real-time regularization of point features to environmental planes which are either maintained in the state vector as long-lived planes, or marginalized for efficiency. Planar regularities are applied to both in-state SLAM and out-of-state MSCKF point features, enabling long-term point-to-plane loop-closures due to the large spacial volume of planes.

  • vicon2gt - This utility was created to generate groundtruth trajectories using a motion capture system (e.g. Vicon or OptiTrack) for use in evaluating visual-inertial estimation systems. Specifically we calculate the inertial IMU state (full 15 dof) at camera frequency rate and generate a groundtruth trajectory similar to those provided by the EurocMav datasets. Performs fusion of inertial and motion capture information and estimates all unknown spacial-temporal calibrations between the two sensors.

  • ov_maplab - This codebase contains the interface wrapper for exporting visual-inertial runs from OpenVINS into the ViMap structure taken by maplab. The state estimates and raw images are appended to the ViMap as OpenVINS runs through a dataset. After completion of the dataset, features are re-extract and triangulate with maplab's feature system. This can be used to merge multi-session maps, or to perform a batch optimization after first running the data through OpenVINS. Some example have been provided along with a helper script to export trajectories into the standard groundtruth format.

  • ov_secondary - This is an example secondary thread which provides loop closure in a loosely coupled manner for OpenVINS. This is a modification of the code originally developed by the HKUST aerial robotics group and can be found in their VINS-Fusion repository. Here we stress that this is a loosely coupled method, thus no information is returned to the estimator to improve the underlying OpenVINS odometry. This codebase has been modified in a few key areas including: exposing more loop closure parameters, subscribing to camera intrinsics, simplifying configuration such that only topics need to be supplied, and some tweaks to the loop closure detection to improve frequency.

Demo Videos


Credit / Licensing

This code was written by the Robot Perception and Navigation Group (RPNG) at the University of Delaware. If you have any issues with the code please open an issue on our github page with relevant implementation details and references. For researchers that have leveraged or compared to this work, please cite the following:

@Conference{Geneva2020ICRA,
  Title      = {{OpenVINS}: A Research Platform for Visual-Inertial Estimation},
  Author     = {Patrick Geneva and Kevin Eckenhoff and Woosik Lee and Yulin Yang and Guoquan Huang},
  Booktitle  = {Proc. of the IEEE International Conference on Robotics and Automation},
  Year       = {2020},
  Address    = {Paris, France},
  Url        = {\url{https://github.com/rpng/open_vins}}
}

The codebase and documentation is licensed under the GNU General Public License v3 (GPL-3). You must preserve the copyright and license notices in your derivative work and make available the complete source code with modifications under the same license (see this; this is not legal advice).