Awesome Visual Localization: Awesome

A curated list of awesome visual localization resources, inspired by awesome-computer-vision and awesome-visual-localization. Visual localization is the task to estimate the 6 dof pose of an image given a representation of the world created using a set of reference images. The representation can be a 3D reconstruction, a set of images with poses tagged or a deep neural network.

This document might have some errors or missing parts. Feel free to make suggestions or pull request. All contributions are well appreciated.

Table of Contents

Main Challenges

  • Illumination changes
  • Dynamic scenes with moving objects
  • Long-time period with different seasons
  • Occlusion of the scene by an object or person
  • Strong viewpoint difference

Benchmark

Challenges

Tutorial

Category

Approach 3D map Pros Cons
Structure-based yes Perform very well in most scenarios Challenging in large environments in terms of processing time and memory consumption
Structure-based with image retrieval yes Improve speed and robustness for large-scale settings Quality heavily relies on image retrieval
Scene point regression yes/no Very accurate position in small-scale settings To be improved in large environments
Absolute pose regression no Fast pose approximation, can be trained for certain challenges Low accuracy
Pose interpolation no Fast and lightweight Quality relies heavily on image retrieval and only provides a rough pose
Relative pose estimation no Fast and lightweight Quality relies heavily on image retrieval and, e.g., local feature matches or a DNN used for relative pose estimation





Image from https://europe.naverlabs.com/blog/methods-for-visual-localization/

Localization Component

Visual Feature

  • [2020 CVPR] ASLFeat: Learning Local Features of Accurate Shape and Localization [paper]
  • [2020 ECCV] Learning Feature Descriptors Using Camera Pose Supervision [paper]
  • [2019 NeurIPS] R2D2: Reliable and Repeatable Detector and Descriptor [paper]
  • [2019 CVPR] D2-Net: A Trainable CNN for Joint Description and Detection of Local Features [paper]
  • [2019 arXiv] From handcrafted to deep local features [paper]
  • [2018 CVPR] Semantic Visual Localization [paper]
  • [2018 CVPR] SuperPoint: Self-Supervised Interest Point Detection and Description [paper]
  • [2017 CVPR] Comparative Evaluation of Hand-Crafted and Learned Local Features [paper]
  • [2017 ICRA] Semantics-aware visual localization under challenging perceptual conditions [paper]
  • [2004 IJCV] Distinctive Image Features from Scale-Invariant Keypoints [paper]

Image Retrieval

  • [2022 arXiv] Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark [paper]
  • [2019 ICCV] Learning With Average Precision: Training Image Retrieval With a Listwise Loss [paper]
  • [2019 TPAMI] Fine-Tuning CNN Image Retrieval with No Human Annotation [paper]
  • [2017 IJCV] End-to-End Learning of Deep Visual Representations for Image Retrieval [paper]
  • [2016 CVPR] NetVLAD: CNN Architecture for Weakly Supervised Place Recognition [paper]
  • [2015 CVPR] 24/7 Place Recognition by View Synthesis [paper]

Feature Match

  • [2022 arXiv] Is Geometry Enough for Matching in Visual Localization? [paper]
  • [2021 CVPR] LoFTR: Detector-Free Local Feature Matching with Transformers [paper] [code] [project]
  • [2020 ECCV] S2DNet : Learning Image Features for Accurate Sparse-to-Dense Matching [paper] [code]
  • [2020 CVPR] SuperGlue: Learning Feature Matching With Graph Neural Networks [paper]
  • [2019 3DV] Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization [paper] [code]
  • [2018 ECCV] Semantic Match Consistency for Long-Term Visual Localization [paper]
  • [2017 TPAMI] Efficient amp; Effective Prioritized Matching for Large-Scale Image-Based Localization [paper]
  • [2017 ICCV] Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map [paper]
  • [2014 3DV] Matching Features Correctly through Semantic Understanding [paper]
  • [2008 TPAMI] Optimal Randomized RANSAC [paper]
  • [1981 CACM] Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography [paper]

Pose Computation

  • [2022 CVPR] The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions [paper]
  • [2020 ECCV] Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization [paper] [code]
  • [2011 CVPR] A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation [paper]

Structure From Motion

  • [2016 CVPR] Structure-from-Motion Revisited [paper]
  • [2013 ICCV] Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion [paper]

Localization System

Structure-based

  • [2021 CVPR] Back to the Feature: Learning Robust Camera Localization from Pixels to Pose [paper] [code]
  • [2019 CVPR] Visual Localization by Learning Objects-Of-Interest Dense Match Regression [paper]
  • [2018 CVPR] InLoc: Indoor Visual Localization with Dense Matching and View Synthesis [paper] [code]
  • [2011 ICCV] Fast Image-Based Localization using Direct 2D-to-3D Matching [paper]

Structure-based With Image Retrieval

  • [2022 arXiv] Robust Image Retrieval-based Visual Localization using Kapture [paper] [code]
  • [2020 ECCV Workshop] Hierarchical Localization with hloc and SuperGlue [slides] [code]
  • [2019 CVPR] From Coarse to Fine: Robust Hierarchical Localization at Large Scale [paper] [code]

Scene Point Regression

  • [2021 TPAMI] Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC [paper] [code]
  • [2020 CVPR] Hierarchical Scene Coordinate Classification and Regression for Visual Localization [paper] [code]
  • [2019 ICCV] SANet: Scene Agnostic Network for Camera Localization [paper] [code]
  • [2019 ICCV] Expert Sample Consensus Applied to Camera Re-Localization [paper] [code]
  • [2018 CVPR] Learning Less is More – 6D Camera Localization via 3D Surface Regression [paper] [code]
  • [2017 CVPR] DSAC - Differentiable RANSAC for Camera Localization [paper] [code]
  • [2013 CVPR] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images [paper]

Absolute Pose Regression

  • [2018 ICRA] Deep Auxiliary Learning for Visual Localization and Odometry [paper]
  • [2018 RA-L] VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry [paper]
  • [2018 CVPR] Geometry-Aware Learning of Maps for Camera Localization [paper] [code]
  • [2017 CVPR] Image-based localization using LSTMs for structured feature correlation [paper]
  • [2017 CVPR] Geometric loss functions for camera pose regression with deep learning [paper]
  • [2015 ICCV] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [paper]

Pose Interpolation

  • [2019 CVPR] Understanding the Limitations of CNN-based Absolute Camera Pose Regression [paper]
  • [2011 ICCV Workshop] Visual localization by linear combination of image descriptors [paper]

Relative Pose Estimation

  • [2020 ICRA] To Learn or Not to Learn: Visual Localization from Essential Matrices [paper]
  • [2019 ICCV] CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization [paper]
  • [2018 ECCV] RelocNet: Continuous Metric Learning Relocalisation using Neural Nets [paper]
  • [2017 ICCV Workshop] Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network [paper] [code]
  • [2006 3DPVT] Image Based Localization in Urban Environments [paper]