Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence
⭐ACCV 2024⭐

Felipe Cadar · Guilherme Potje · Renato Mastins · Cédric Demonceaux · Erickson R Nascimento


example
Leveraging semantic information for improving visual correspondence.

Installation

To set up the environment for training, run the following command to create a new conda environment. We recommend using Python 3.9:

conda create -n reason  python=3.9

Activate the environment before proceeding:

conda activate reason

Install the package:

pip install -e .

Inference

from reasoning.features.desc_reasoning import load_reasoning_from_checkpoint, Reasoning

# load the model with pre-trained weights
semantic_reasoning = load_reasoning_from_checkpoint('models/xfeat/')
# load it into the auxiliary class
reasoning_model = Reasoning(semantic_reasoning['model'])

# match two images
match_response = reasoning_model.match({
    'image0': image0, # BxCxHxW normalized to [0,1]
    'image1': image1  # BxCxHxW normalized to [0,1]
})

# get the matches
mkpts0 = match_response['matches0'] # BxNx2
mkpts1 = match_response['matches1'] # BxNx2

The example.py script shows how to automatically download and run a specific model.

The following table contains links to all the models and weights we used in our experiments.

Descriptor Pre-trained weights Size
xfeat Download 91.6 MB
superpoint Download 91.0 MB
alike Download 92.1 MB
aliked Download 91.9 MB
dedode_B Download 92.2 MB
dedode_G Download 94.1 MB
xfeat-12_layers-dino_G Download 221.0 MB
xfeat-12_layers Download 219.0 MB
xfeat-3_layers Download 57.1 MB
xfeat-7_layers Download 132 MB
xfeat-9_layers Download 167 MB
xfeat-dino-G Download 94.3 MB
xfeat-dino_B Download 92.3 MB
xfeat-dino_L Download 92.6 MB

Training

You might want to train your own model to reason about your own descriptors. You need to take some preparations:

1. Scannet Data Preparation

The processed dataset is available for download here: h5_scannet.zip

But if you want to follow the same steps we took to create it, take a look at the steps bellow.

To prepare the Scannet dataset for training, follow these steps:

  1. Download Scannet: First, download the Scannet dataset. Make sure to read and accept the terms of use.
python reasoning/scripts/scannet/01_download_scannet.py --out_dir datasets/scannet
  1. Extract Frames: Extract frames from the downloaded dataset, skipping every 15 frames.
python reasoning/scripts/scannet/02_extract_scannet.py --data_path datasets/scannet
  1. Calculate Covisibility: Calculate the covisibility between frames to identify good pairs for training.
python reasoning/scripts/scannet/03_calculate_scannet_covisibility.py --data_path datasets/scannet
  1. Convert to H5 Files: Convert the prepared data into H5 files for easier handling during training. It also helps to keep the number of files small in cluster enviroments.
python reasoning/scripts/scannet/04_build_h5.py --data_path datasets/scannet --output datasets/h5_scannet/

2. Feature Extraction

To speed up the training process, pre-extract some features from the dataset. Ours scripts read the h5 dataset and save the features to the save directory

DINOv2-S Features Extraction

Extract DINOv2-S features from the H5 dataset. You can adjust the batch size according to your system's capabilities.

python reasoning/scripts/export_dino.py --data ./datasets/h5_scannet --batch_size 4 --dino_model dinov2_vits14

For larger models, simply change the --dino_model argument to one of the following: dinov2_vitb14, dinov2_vitl14, or dinov2_vitg14.

XFeat Features Extraction

Extract XFeat features from the dataset. Adjust the batch size as needed.

python reasoning/scripts/export_xfeat.py --data ./datasets/h5_dataset --batch_size 4 --num_keypoints 2048 h5_scannet

Your dataset folder should look like this:

datasets/
├── h5_scannet/
│   ├── train/
│   ├── features/
│   │   ├── dino-scannet-dinov2_vits14/
│   │   └── xfeat-scannet-n2048/
└── scannet/
    └── scans/

For other descriptors, please check the reasoning/scripts/export_*.py scripts.

3. Training the Model

All training and experiments were conducted on a SLURM cluster with 4xV100 32GB GPUs. Adjust the batch size to match your system's capabilities.

To start training, run the following command:

python reasoning/train_multigpu_reasoning.py \
    --batch_size 16 \ 
    --data ./datasets/h5_scannet \ # dataset folder with images and features
    --plot_every 200 \ # tensorboard matching plots
    --extractor_cache 'xfeat-scannet-n2048' \ # local features
    --dino_cache 'dino-scannet-dinov2_vits14' \ # semantic features
    -C xfeat-dinov2 # comment for tracking your exps

If you want to skip all the multi-gpu shenanigans, you can simply add the --local flag.

Acknowledgements

This work was partially supported by grants from CAPES, CNPq, FAPEMIG, Google, ANER MOVIS from Conseil Régional BFC and ANR (ANR-23-CE23-0003-01), to whom we are grateful. This project was also provided with AI computing and storage resources by GENCI at IDRIS thanks to the grant 2024-AD011015289 on the supercomputer Jean Zay’s V100 partitions.

Shout out to the authors of DeDoDe for this readme header. Its quite nice.