A curated list of awesome visual localization resources, inspired by awesome-computer-vision and awesome-visual-localization. Visual localization is the task to estimate the 6 dof pose of an image given a representation of the world created using a set of reference images. The representation can be a 3D reconstruction, a set of images with poses tagged or a deep neural network.
This document might have some errors or missing parts. Feel free to make suggestions or pull request. All contributions are well appreciated.
- Illumination changes
- Dynamic scenes with moving objects
- Long-time period with different seasons
- Occlusion of the scene by an object or person
- Strong viewpoint difference
- [2022 ECCV] Map-Based Localization for Autonomous Driving
- [2021 ICCV] Long-Term Visual Localization under Changing Conditions
- [2021 ICCV] Map-Based Localization for Autonomous Driving
- [2020 ECCV] Long-Term Visual Localization under Changing Conditions
- [2020 ECCV] Map-Based Localization for Autonomous Driving
- [2019 CVPR] Long-Term Visual Localization under Changing Conditions
Approach | 3D map | Pros | Cons |
---|---|---|---|
Structure-based | yes | Perform very well in most scenarios | Challenging in large environments in terms of processing time and memory consumption |
Structure-based with image retrieval | yes | Improve speed and robustness for large-scale settings | Quality heavily relies on image retrieval |
Scene point regression | yes/no | Very accurate position in small-scale settings | To be improved in large environments |
Absolute pose regression | no | Fast pose approximation, can be trained for certain challenges | Low accuracy |
Pose interpolation | no | Fast and lightweight | Quality relies heavily on image retrieval and only provides a rough pose |
Relative pose estimation | no | Fast and lightweight | Quality relies heavily on image retrieval and, e.g., local feature matches or a DNN used for relative pose estimation |
Image from https://europe.naverlabs.com/blog/methods-for-visual-localization/
- [2020 CVPR] ASLFeat: Learning Local Features of Accurate Shape and Localization [paper]
- [2020 ECCV] Learning Feature Descriptors Using Camera Pose Supervision [paper]
- [2019 NeurIPS] R2D2: Reliable and Repeatable Detector and Descriptor [paper]
- [2019 CVPR] D2-Net: A Trainable CNN for Joint Description and Detection of Local Features [paper]
- [2019 arXiv] From handcrafted to deep local features [paper]
- [2018 CVPR] Semantic Visual Localization [paper]
- [2018 CVPR] SuperPoint: Self-Supervised Interest Point Detection and Description [paper]
- [2017 CVPR] Comparative Evaluation of Hand-Crafted and Learned Local Features [paper]
- [2017 ICRA] Semantics-aware visual localization under challenging perceptual conditions [paper]
- [2004 IJCV] Distinctive Image Features from Scale-Invariant Keypoints [paper]
- [2022 arXiv] Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark [paper]
- [2019 ICCV] Learning With Average Precision: Training Image Retrieval With a Listwise Loss [paper]
- [2019 TPAMI] Fine-Tuning CNN Image Retrieval with No Human Annotation [paper]
- [2017 IJCV] End-to-End Learning of Deep Visual Representations for Image Retrieval [paper]
- [2016 CVPR] NetVLAD: CNN Architecture for Weakly Supervised Place Recognition [paper]
- [2015 CVPR] 24/7 Place Recognition by View Synthesis [paper]
- [2022 arXiv] Is Geometry Enough for Matching in Visual Localization? [paper]
- [2021 CVPR] LoFTR: Detector-Free Local Feature Matching with Transformers [paper] [code] [project]
- [2020 ECCV] S2DNet : Learning Image Features for Accurate Sparse-to-Dense Matching [paper] [code]
- [2020 CVPR] SuperGlue: Learning Feature Matching With Graph Neural Networks [paper]
- [2019 3DV] Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization [paper] [code]
- [2018 ECCV] Semantic Match Consistency for Long-Term Visual Localization [paper]
- [2017 TPAMI] Efficient amp; Effective Prioritized Matching for Large-Scale Image-Based Localization [paper]
- [2017 ICCV] Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map [paper]
- [2014 3DV] Matching Features Correctly through Semantic Understanding [paper]
- [2008 TPAMI] Optimal Randomized RANSAC [paper]
- [1981 CACM] Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography [paper]
- [2022 CVPR] The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions [paper]
- [2020 ECCV] Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization [paper] [code]
- [2011 CVPR] A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation [paper]
- [2016 CVPR] Structure-from-Motion Revisited [paper]
- [2013 ICCV] Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion [paper]
- [2021 CVPR] Back to the Feature: Learning Robust Camera Localization from Pixels to Pose [paper] [code]
- [2019 CVPR] Visual Localization by Learning Objects-Of-Interest Dense Match Regression [paper]
- [2018 CVPR] InLoc: Indoor Visual Localization with Dense Matching and View Synthesis [paper] [code]
- [2011 ICCV] Fast Image-Based Localization using Direct 2D-to-3D Matching [paper]
- [2022 arXiv] Robust Image Retrieval-based Visual Localization using Kapture [paper] [code]
- [2020 ECCV Workshop] Hierarchical Localization with hloc and SuperGlue [slides] [code]
- [2019 CVPR] From Coarse to Fine: Robust Hierarchical Localization at Large Scale [paper] [code]
- [2021 TPAMI] Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC [paper] [code]
- [2020 CVPR] Hierarchical Scene Coordinate Classification and Regression for Visual Localization [paper] [code]
- [2019 ICCV] SANet: Scene Agnostic Network for Camera Localization [paper] [code]
- [2019 ICCV] Expert Sample Consensus Applied to Camera Re-Localization [paper] [code]
- [2018 CVPR] Learning Less is More – 6D Camera Localization via 3D Surface Regression [paper] [code]
- [2017 CVPR] DSAC - Differentiable RANSAC for Camera Localization [paper] [code]
- [2013 CVPR] Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images [paper]
- [2018 ICRA] Deep Auxiliary Learning for Visual Localization and Odometry [paper]
- [2018 RA-L] VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry [paper]
- [2018 CVPR] Geometry-Aware Learning of Maps for Camera Localization [paper] [code]
- [2017 CVPR] Image-based localization using LSTMs for structured feature correlation [paper]
- [2017 CVPR] Geometric loss functions for camera pose regression with deep learning [paper]
- [2015 ICCV] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [paper]
- [2019 CVPR] Understanding the Limitations of CNN-based Absolute Camera Pose Regression [paper]
- [2011 ICCV Workshop] Visual localization by linear combination of image descriptors [paper]
- [2020 ICRA] To Learn or Not to Learn: Visual Localization from Essential Matrices [paper]
- [2019 ICCV] CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization [paper]
- [2018 ECCV] RelocNet: Continuous Metric Learning Relocalisation using Neural Nets [paper]
- [2017 ICCV Workshop] Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network [paper] [code]
- [2006 3DPVT] Image Based Localization in Urban Environments [paper]