/AnyLoc

AnyLoc: Universal Visual Place Recognition (RA-L 2023)

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

AnyLoc: Towards Universal Visual Place Recognition

License: BSD-3 stars arXiv githubio github YouTube Hugging Face Space Open In Colab: Global Descriptors Open In Colab: Cluster visualizations Public Release on IIITH-OneDrive Hugging Face Paper

Table of contents

Contents

The contents of this repository are as follows

S. No. Item Description
1 demo Contains standalone demo scripts (Quick start, Jupyter Notebook, and Gradio app) to run our AnyLoc-VLAD-DINOv2 method. Also contains guides for APIs. This folder is self-contained (doesn't use anything outside it).
2 scripts Contains all scripts for development. Use the -h option for argument information.
3 configs.py Global configurations for the repository
4 utilities Utility Classes & Functions (includes DINOv2 hooks & VLAD)
5 conda-environment.yml The conda environment (it could fail to install OpenAI CLIP as it includes a git+ URL). We suggest you use the setup_conda.sh script.
6 requirements.txt Requirements file for pip virtual environment. Probably out of date.
7 custom_datasets Custom datalaoder implementations for VPR.
8 examples Miscellaneous example scripts
9 MixVPR Minimal MixVPR inference code
10 clip_wrapper.py A wrapper around two CLIP implementations (OpenAI and OpenCLIP).
11 models_mae.py MAE implementation
12 dino_extractor.py DINO (v1) feature extractor
13 CONTRIBUTING.md Note for contributors
14 paper_utils Paper scripts (formatting for figures, etc.)

Included Repositories

Includes the following repositories (currently not submodules) as subfolders.

Directory Link Cloned On Description
dvgl-benchmark gmberton/deep-visual-geo-localization-benchmark 2023-02-12 For benchmarking
datasets-vg gmberton/datasets_vg 2023-02-13 For dataset download and formatting
CosPlace gmberton/CosPlace 2023-03-20 Baseline Comparisons

Included Datasets

We release all the benchmarking datasets in our public release.

  1. Download the .tar.gz files from here > Datasets-All (for the datasets you want to use)

  2. Unzip them using tar -xvzf ./NAME.tar.gz. They should unzip into a directory with NAME.

    • If you're using our benchmarking scripts, this directory where you're storing the datasets is the parameter --prog.data-vg-dir (in most scripts).
    • See Dataset Setup for detailed information (including how the data directory structure should look after unzipping)

We thank the following sources for the rich datasets

  1. Baidu Autonomous Driving Business Unit for the Baidu Mall dataset present in baidu_datasets.tar.gz
  2. Queensland University of Technology for the Gardens Point dataset present in gardens.tar.gz
  3. York University for the 17 Places dataset present in 17places.tar.gz
  4. Tokyo Institute of Technology, INRIA, and CTU Prague for the Pitts-30k dataset present in pitts30k.tar.gz
  5. Queensland University of Technology for the St. Lucia dataset present in st_lucia.tar.gz
  6. University of Oxford for the Oxford dataset present in Oxford_Robotcar.tar.gz
  7. AirLab at CMU for the Hawkins dataset present in hawkins_long_corridor.tar.gz, the Laurel Caverns dataset present in laurel_caverns.tar.gz, and the Nardo Air dataset present in test_40_midref_rot0.tar.gz (not rotated) and test_40_midref_rot90.tar.gz (rotated).
  8. Fraunhofer FKIE and TU Munich for the VP-Air dataset present in VPAir.tar.gz
  9. Ifremer and University of Toulon for the Mid-Atlantic Ridge dataset present in eiffel.tar.gz

Most of the contents of the zipped folders are from the original sources. We generate the ground truth for some of the datasets as .npy files; see this issue for more information.

The copyright of each dataset is held by the original sources.

PapersWithCode Badges

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Getting Started

Tip: You can explore the HuggingFace Space and the Colab notebooks (no GPU needed).

Clone this repository

git clone https://github.com/AnyLoc/AnyLoc.git
cd AnyLoc

Set up the conda environment (you can also use mamba instead of conda; the script will automatically detect it)

conda create -n anyloc python=3.9
conda activate anyloc
bash ./setup_conda.sh
# If you want to install the developer tools as well
bash ./setup_conda.sh --dev

The setup takes about 11 GB of disk space. You can also use an existing conda environment, say vl-vpr, by doing

bash ./setup_conda.sh vl-vpr

Note the following:

  • All our public release files can be found in our public release.
    • If the conda environment setup is taking time, you could just unzip conda-env.tar.gz (GB) in your ~/anaconda3/envs folder (but compatibility is not guaranteed).
  • The ./scripts folder is for validating our results and seeing the main scripts. Most applications are in the ./demo folder. See the list of demos before running anything.
  • If you're running something in the ./scripts folder, run it with pwd in this (repository) folder. For example, python scripts are run as python ./scripts/<script>.py and bash scripts are run as bash ./scripts/<script>.sh. For the demos and other baselines, you should cd into respective folders.
  • The utilities.py file is mainly for ./scripts files. All demos actually use the demo/utilities.py file (which is distilled and minimal). Using the latter should be enough to implement our SOTA method.

Using the SOTA: AnyLoc-VLAD-DINOv2

Open In Colab Local Python script

Using the APIs

Import the utilities

from utilities import DinoV2ExtractFeatures
from utilities import VLAD

DINOv2

DINOv2 feature extractor can be used as follows

extractor = DinoV2ExtractFeatures("dinov2_vitg14", desc_layer,
        desc_facet, device=device)

Get the descriptors using

# Make image patchable (14, 14 patches)
c, h, w = img_pt.shape
h_new, w_new = (h // 14) * 14, (w // 14) * 14
img_pt = tvf.CenterCrop((h_new, w_new))(img_pt)[None, ...]
# Main extraction
ret = extractor(img_pt) # [1, num_patches, desc_dim]

VLAD

The VLAD aggregator can be loaded with vocabulary (cluster centers) from a c_centers.pt file.

# Main VLAD object
vlad = VLAD(num_c, desc_dim=None, cache_dir=os.path.dirname(c_centers_file))
vlad.fit(None)  # Load the vocabulary (and auto-detect `desc_dim`)
# Cluster centers have shape: [num_c, desc_dim]
#   - num_c: number of clusters
#   - desc_dim: descriptor dimension

If you have a database of descriptors you want to fit, use

vlad.fit(ein.rearrange(full_db_vlad, "n k d -> (n k) d"))
# n: number of images
# k: number of patches/descriptors per image
# d: descriptor dimension

To get the VLAD representations of multiple images, use

db_vlads: torch.Tensor = vlad.generate_multi(full_db)
# Shape of full_db: [n_db, n_d, d_dim]
#   - n_db: number of images in the database
#   - n_d: number of descriptors per image
#   - d_dim: descriptor dimension
# Shape of db_vlads: [n_db, num_c * d_dim]
#   - num_c: number of clusters (centers)

DINOv1

This is present in dino_extractor.py (not a part of demo/utilities.py).

Initialize and use it as follows the extractor

# Import it
from dino_extractor import ViTExtractor
...

# Initialize it (layer and key are when extracting descriptors)
extractor = ViTExtractor("dino_vits8", stride=4, 
        device=device)
...

# Use it to extract patch descriptors
img = ein.rearrange(img, "c h w -> 1 c h w").to(device)
img = F.interpolate(img, (224, 298))    # For 4:3 images
desc = extractor.extract_descriptors(img,
        layer=11, facet="key") # [1, 1, num_descs, d_dim]
...

Validating the Results

You don't need to read further if you're not experimentally validating the entire results (enjoy the demos instead) or building on this repository from source.

The following sections are for the curious minds who want to reproduce the results.

Note to/for contributors: Please follow contributing guidelines. This is mainly for developers/authors who'll be pushing to this repository.

All the runs were done on a machine with the following specifications:

  • CPU: Two Intel Xeon Gold 5317 CPUs (12C24T each)
  • CPU RAM: 256 GB
  • GPUs: Four NVIDIA RTX 3090 GPUs (24 GB, 10496 CUDA cores each)
  • Storage: 3.5 TB HDD on /scratch. However, all datasets will take 32+ GB, have more for other requirements (for VLAD cluster centers, caching, models, etc.). We noticed that singularity (with SIF, cache, and tmp) used 90+ GB.
    • Driver Version (NVIDIA-SMI): 570.47.03
    • CUDA (SMI): 11.6

We use only one GPU; however, some experiments (with large datasets) might need all of the CPU RAM (for efficient/fast nearest neighbor search). Ideally, a 16 GB GPU should also work.

Do the following

  1. Clone the repository and setup the NVIDIA NGC container (run everything inside it)
  2. Setup the datasets (download, format, and extract them)
  3. Run the script you want to test from scripts folder

Start by cloning/setting up the repository

cd ~/Documents
git clone https://github.com/AnyLoc/AnyLoc.git vl-vpr

NVIDIA NGC Singularity Container Setup

Despite using recommended practices of reproducibility (see function seed_everything in utilities.py) in PyTorch, we noticed minor changes across GPU types and CUDA versions. To mitigate this, we recommend using a singularity container.

Setting up the environment in a singularity container (in a SLURM environment)

TL;DR: Run the following (this system is a different one). This was tested on CMU's Bridges-2 partition of PSC HPC. Don't use this if you want to replicate the tables in the paper (but the numbers come close).

salloc -p GPU-small -t 01:00:00 --ntasks-per-node=5 --gres=gpu:v100-32:1
cd /ocean/containers/ngc/pytorch/
singularity instance start --nv pytorch_22.12-py3.sif vlvpr
singularity run --nv instance://vlvpr
cd /ocean/projects/cis220039p/nkeetha/data/singularity/venv
source vlvpr/bin/activate
cd /ocean/projects/cis220039p/<path to vl-vpr scripts folder>

Main setup: For Singularity on IIITH's Ada HPC (Ubuntu 18.04) - our main setup for validation. Use this if you want to replicate the tables in the paper (hardware should be same as listed before).

The script below assumes that this repository is cloned in ~/Documents/vl-vpr. That is, this README is at ~/Documents/vl-vpr/README.md.

# Load the module and configurations
module load u18/singularity-ce/3.9.6
mkdir -p /scratch/$USER/singularity && cd $_ && mkdir .cache .tmp venvs
export SINGULARITY_CACHEDIR=/scratch/$USER/singularity/.cache
export SINGULARITY_TMPDIR=/scratch/$USER/singularity/.tmp
# Ensure that the next command gives output "1" (or anything other than "0")
cat /proc/sys/kernel/unprivileged_userns_clone
# Setup the container (download the image if not there already) - (15 GB cache + 7.5 GB file)
singularity pull ngc_pytorch_22.12-py3 docker://nvcr.io/nvidia/pytorch:22.12-py3
# Test container through shell
singularity shell --nv ngc_pytorch_22.12-py3
# Start and run the container (mount the symlinked and scratch folders)
singularity instance start --mount "type=bind,source=/scratch/$USER,destination=/scratch/$USER" \
    --nv ngc_pytorch_22.12-py3 vl-vpr
singularity run --nv instance://vl-vpr
# Create virtual environment
cd ~/Documents/vl-vpr/
pip install virtualenv
cd venvs
virtualenv --system-site-packages vl-vpr
# Activate virtualenv and install all packages
cd ~/Documents/vl-vpr/
source ./venvs/vl-vpr/bin/activate
bash ./setup_virtualenv_ngc.sh
# Run anything you want (from here, but find the file in scripts)
cd ~/Documents/vl-vpr/
python ./scripts/<task name>.py <args>
# The baseline scripts should be run in their own folders. For example, to run CosPlace, do
cd ~/Documents/vl-vpr/
cd CosPlace
python ./<script>.py

Dataset Setup

Datasets Note: See the Datasets-All folder in our public material (for .tar.gz files). Also see included datasets.

Set them up in a folder with sufficient space

mkdir -p /scratch/$USER/vl-vpr/datasets && cd $_

Download (and unzip) the datasets from here (Datasets-All folder) into this folder. Link this folder (for easy access form this repository)

cd ~/Documents/vl-vpr/
cd ./datasets-vg
ln -s /scratch/$USER/vl-vpr/datasets datasets

After setting up all datasets, the folders should look like this (in the dataset folder). Run the following command to get the tree structure.

tree ./eiffel ./hawkins*/ ./laurel_caverns ./VPAir ./test_40_midref_rot*/ ./Oxford_Robotcar ./gardens ./17places ./baidu_datasets ./st_lucia ./pitts30k --filelimit=20 -h
  • The test_40_midref_rot0 is Nardo Air. This is also referred as Tartan_GNSS_notrotated in our scripts.
  • The test_40_midref_rot90 is Nardo Air-R (rotated). This is also referred as Tartan_GNSS_rotated in out scripts.
  • The hawkins_long_corridor is the Hawkins dataset (degraded environment).
  • The eiffel dataset is Mid-Atlantic Ridge (underwater dataset).

Output will be something like

./eiffel
├── [4.0K]  db_images [65 entries exceeds filelimit, not opening dir]
├── [2.2K]  eiffel_gt.npy
└── [4.0K]  q_images [101 entries exceeds filelimit, not opening dir]
./hawkins_long_corridor/
├── [4.0K]  db_images [127 entries exceeds filelimit, not opening dir]
├── [ 12K]  images [314 entries exceeds filelimit, not opening dir]
├── [ 17K]  pose_topic_list.npy
└── [4.0K]  q_images [118 entries exceeds filelimit, not opening dir]
./laurel_caverns
├── [4.0K]  db_images [141 entries exceeds filelimit, not opening dir]
├── [ 20K]  images [744 entries exceeds filelimit, not opening dir]
├── [ 41K]  pose_topic_list.npy
└── [4.0K]  q_images [112 entries exceeds filelimit, not opening dir]
./VPAir
├── [ 677]  camera_calibration.yaml
├── [420K]  distractors [10000 entries exceeds filelimit, not opening dir]
├── [4.0K]  distractors_temp
├── [ 321]  License.txt
├── [177K]  poses.csv
├── [ 72K]  queries [2706 entries exceeds filelimit, not opening dir]
├── [160K]  reference_views [2706 entries exceeds filelimit, not opening dir]
├── [ 96K]  reference_views_npy [2706 entries exceeds filelimit, not opening dir]
└── [ 82K]  vpair_gt.npy
./test_40_midref_rot0/
├── [ 46K]  gt_matches.csv
├── [2.8K]  network_config_dump.yaml
├── [5.3K]  query.csv
├── [4.0K]  query_images [71 entries exceeds filelimit, not opening dir]
├── [2.9K]  reference.csv
└── [4.0K]  reference_images [102 entries exceeds filelimit, not opening dir]
./test_40_midref_rot90/
├── [ 46K]  gt_matches.csv
├── [2.8K]  network_config_dump.yaml
├── [5.3K]  query.csv
├── [4.0K]  query_images [71 entries exceeds filelimit, not opening dir]
├── [2.9K]  reference.csv
└── [4.0K]  reference_images [102 entries exceeds filelimit, not opening dir]
./Oxford_Robotcar
├── [4.0K]  __MACOSX
│   └── [4.0K]  oxDataPart
├── [4.0K]  oxDataPart
│   ├── [4.0K]  1-m [191 entries exceeds filelimit, not opening dir]
│   ├── [ 24K]  1-m.npz
│   ├── [ 13K]  1-m.txt
│   ├── [4.0K]  1-s [191 entries exceeds filelimit, not opening dir]
│   ├── [ 24K]  1-s.npz
│   ├── [4.0K]  1-s-resized [191 entries exceeds filelimit, not opening dir]
│   ├── [ 13K]  1-s.txt
│   ├── [4.0K]  2-s [191 entries exceeds filelimit, not opening dir]
│   ├── [ 24K]  2-s.npz
│   ├── [4.0K]  2-s-resized [191 entries exceeds filelimit, not opening dir]
│   └── [ 13K]  2-s.txt
├── [ 15K]  oxdatapart.mat
└── [ 66M]  oxdatapart_seg.npz
./gardens
├── [4.0K]  day_left [200 entries exceeds filelimit, not opening dir]
├── [4.0K]  day_right [200 entries exceeds filelimit, not opening dir]
├── [3.6K]  gardens_gt.npy
└── [4.0K]  night_right [200 entries exceeds filelimit, not opening dir]
./17places
├── [ 14K]  ground_truth_new.npy
├── [ 13K]  my_ground_truth_new.npy
├── [ 12K]  query [406 entries exceeds filelimit, not opening dir]
├── [ 514]  ReadMe.txt
└── [ 12K]  ref [406 entries exceeds filelimit, not opening dir]
./baidu_datasets
├── [4.0G]  IDL_dataset_cvpr17_3852.zip
├── [387M]  mall.pcd
├── [108K]  query_gt [2292 entries exceeds filelimit, not opening dir]
├── [ 96K]  query_images_undistort [2292 entries exceeds filelimit, not opening dir]
├── [2.7K]  readme.txt
├── [ 44K]  training_gt [689 entries exceeds filelimit, not opening dir]
└── [ 36K]  training_images_undistort [689 entries exceeds filelimit, not opening dir]
./st_lucia
├── [4.0K]  images
│   └── [4.0K]  test
│       ├── [180K]  database [1549 entries exceeds filelimit, not opening dir]
│       └── [184K]  queries [1464 entries exceeds filelimit, not opening dir]
└── [695K]  map_st_lucia.png
./pitts30k
└── [4.0K]  images
    ├── [4.0K]  test
    │   ├── [1.2M]  database [10000 entries exceeds filelimit, not opening dir]
    │   ├── [5.9M]  database.npy
    │   ├── [864K]  queries [6816 entries exceeds filelimit, not opening dir]
    │   └── [4.0M]  queries.npy
    ├── [4.0K]  train
    │   ├── [1.3M]  database [10000 entries exceeds filelimit, not opening dir]
    │   ├── [5.9M]  database.npy
    │   ├── [948K]  queries [7416 entries exceeds filelimit, not opening dir]
    │   └── [4.4M]  queries.npy
    └── [4.0K]  val
        ├── [1.3M]  database [10000 entries exceeds filelimit, not opening dir]
        ├── [5.8M]  database.npy
        ├── [980K]  queries [7608 entries exceeds filelimit, not opening dir]
        └── [4.4M]  queries.npy

These directories are put under ./datasets_vg/datasets folder (can store them in scratch and symlink it there). For example, the 17places dataset can be found under ./datasets_vg/datasets/17places folder.

Original dataset webpages:

Some datasets can be found at other places

References

We thank the authors of the following repositories for their open source code and data:

Cite Our Work

Thanks for using our work. You can cite it as:

@article{keetha2023anyloc,
  title={AnyLoc: Towards Universal Visual Place Recognition}, 
  author={Keetha, Nikhil and Mishra, Avneesh and Karhade, Jay and Jatavallabhula, Krishna Murthy and Scherer, Sebastian and Krishna, Madhava and Garg, Sourav},
  journal={IEEE Robotics and Automation Letters},
  year={2023},
  publisher={IEEE},
  volume={9},
  number={2},
  pages={1286-1293},
  doi={10.1109/LRA.2023.3343602}
}

Developers: