
D2S: Representing descriptors and scene coordinates

D2S: Representing sparse descriptors and scene coordinates for visual relocalization. Project Page


D2S is a learning-based visual re-localization method. It concentrates on learning to generate 3D scene coordinates from sparse descriptors extracted from a single image. Once trained for a specific sparse SfM scene, D2S can accurately estimate the camera’s position and orientation from a new image of the scene. D2S also serves as a natural filter for outliers and non-robust descriptors, resulting in an enhanced localization accuracy down the line.

This repository contains the Pytorch implementation of our papers:


D2S is based on PyTorch. The main framework is implemented in Python, including data processing and setting parameters. D2S requires the following Python packages, and we tested it with the package versions in brackets.

pytorch (>=1.7.0)
opencv (4.7.0)
Pillow (9.4.0)
h5py (3.8.0)
visdom (0.2.4)

You can also install the environment using this command:

conda env create -f environment.yml

D2S uses hierarchical localization toolbox(hloc) to label descriptors coordinates. Please download this toolbox to third_party folder as follows:

├── third_party
│   ├── Hierarchical_Localization

For the installation of hloc, you can use the same environment with D2S, just need to install some more Python packages that hloc requires.


Supported Datasets

Data Preprocessing

  1. You need to run the hloc pipeline to generate the SfM models for each dataset. For example, with 7scenes and Cambridge Landmarks datasets, you can simply run the code provided by hloc from these guides 7scenes pipeline and Cambridge pipeline. Note that D2S has been tested using a fixed 2048 SuperPoint descriptors per image, please configure this in hloc before execution to produce correct data. Since the rest datasets are not supported by hloc, we will provide the script to run the hloc on them later. Then, please create the folder and run the commands to generate SfM models as same as 7scenes and Cambridge.

  2. Now you can generate training and testing data using this script. Please config the dataset and scene name in the preprocessing.py file before running this:

cd processing
python preprocessing.py --dataset_dir <path to dataset folder> --dataset <name of the dataset> --scene <name of the scene>
Command Line Arguments for preprocessing.py


Path to the datset folder (../third_party/Hierarchical_Localization/datasets/ by default).


Name of the datset (Eg: 7scenes, Cambridge).


Name of the scene (Eg: chess, fire).


Path to the directory where you store the result after running hloc (../third_party/Hierarchical-Localization/outputs/ by default).


Path to the output directory (../dataset by default).


Option to perform data augmentation on training data (True by default).


Generate pseudo data from unlabels (False by default).


Do augmentation on unlabel data (False by default).

Training & Evaluation

You will need to start a Visdom server for logging the training progress in a different terminal by running:

python -m visdom.server -env_path=logs/

Then execute this command to train and evaluate the results:

sh run_train_eval.sh

For evaluating a single checkpoint, example:

python eval.py --dataset 7scenes --scene chess --config_file configs/configsV2.ini --model 2 --cudaid 0 --single 1 --epoch 900

