
a paper list of visual re-localization algorithms

Reviewing Monocular Re-Localization From the Perspective of Scene Map

This repo contains a curative list of monocular relocalzation algorithm, which is categorized into five classes based on its utilized scene map. Comprehensive review can be found in our survey

Jinyu Miao, Kun Jiang, Tuopu Wen, Yunlong Wang, Peijing Jia, Xuhe Zhao, Qian Cheng, Zhongyang Xiao, Jin Huang, Zhihua Zhong, Diange Yang, "A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation," arXiv preprint arXiv:2311.15643, 2023.

Categorisation of utilized scene map in monocular re-localization.


A Geo-tagged Frame Map

Geo-tagged Frame Map is composed of posed keyframes.

The related visual relocalization algorithm can be classified into two categories: Visual Place Recognition (VPR) and Relative Pose Estimation (RPR).

A-1 Visual Place Recognition

Given current query image, VPR identifies the re-observed places by retrieving reference image(s) when the vehicle goes back to a previously visited scene, which is often used as coarse step in hierarchical localization pipeline or Loop Closure Detection (LCD) module in Simultaneous Localization and Mapping (SLAM) system. The pose of retrieved reference image(s) can be regarded as an apporximated pose of current query image.

Global feature-based visual place recognition

Local feature-based visual place recognition

Sequence feature-based visual place recognition

Semantic-based visual place recognition

A-2 Relative Pose Estimation

RPR methods aims to estimate the relative pose between query image and reference image in the map.

Geometry-based relative pose estimation

Regression-based relative pose estimation

B Visual Landmark Map

Visual landmark Map is composed by visual landmarks. Visual landmarks are some informative and representative 3D points that lifted from 2D pixels by 3D reconstruction, and they are associated with corresponding local features in various observed reference images including 2D key point and high-dimensional descriptor. During localization stage, query image is first matched with reference image(s) and the resulting 2D-2D matches are lifted to 2D-3D matches between query image and visual landmark map, which can be used to solve scaled pose as a typical Perspective-n-Point (PnP) problem.

B-1 Local Feature Extraction-then-Matching

Local feature extraction

Local feature matching

B-2 Joint Local Feature Extraction-and-Matching

B-3 Pose Solver

B-4 Further Improvements

Cross descriptor matching

Line feature

Dense CNN matching

Localization without SfM

Map squeeze

Pose verification and correction

Opensourced toolbox

C Point Cloud Map

Point Cloud Map only contains 3D position of point clouds and its intensity (somtimes missing). Monocular localization in point cloud map is also called cross-modal localization.

C-1 Geometry-based Cross-modal Localization

C-2 Learning-based Cross-modal Localization

Cross-modal visual place recognition

Cross-modal relative pose regression

Cross-modal matching-based localization

D Vectorized HD Map

The localization feature in the HD Map includes dense point cloud and sparse map element, here we focus on the sparse map elements usually represented as vectors with semantic labels.

E Learnt Implicit Map

Some recently proposed works implicitly encode scene map into neural networks so that the network can achieve amazing things, such as directly recover pose of images (called Absolute Pose Regression, APR), estimate 3D coordinates of each pixel in images (called Scene Coordinate Regression, SCR), or render geometry structure and appearance of scene (called Neural Radiance Field, NeRF). We name such scene map-related information in trained neural network as Learnt Implicit Map.

E-1 Absolute Pose Regression

Single scene absolute pose regression

Multiple scene absolute pose regression

E-2 Scene Coordinate Regression

Scene-specific SCR

Scene-agnostic SCR

E-3 Neural Radiance Field

NeRF as pose estimator

NeRF as data augmentation

If you found this repository and survey helpful, please consider citing our related survey:

      title={A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation}, 
      author={Jinyu Miao, Kun Jiang, Tuopu Wen, Yunlong Wang, Peijing Jia, Benny Wijaya, Xuhe Zhao, Qian Cheng, Zhongyang Xiao, Jin Huang, Zhihua Zhong, Diange Yang},
      journal={arXiv preprint arXiv:2311.15643},