/vg-transformers

Official Repository of "Learning Sequential Descriptors for Sequence-based Visual Place Recognition "

Primary LanguagePythonMIT LicenseMIT

Learning Sequential Descriptors for Sequence-based Visual Place Recognition


Taxonomy of sequential descriptor methods.

This is the official repository for the paper "Learning Sequential Descriptors for Sequence-based Visual Place Recognition". It can be used to reproduce results from the paper and experiment with a wide range of sequential descriptor methods for Visual Place Recognition.

Install Locally

Create your local environment and then install the required packages using:

pip install -r pip_requirements.txt
# to install the official TimeSformer package
git clone https://github.com/facebookresearch/TimeSformer
cd TimeSformer  
python setup.py build develop

Datasets

The experiments in the paper use two main datasets Mapillary Street Level Sequence (MSLS) and Oxford RobotCar.

MSLS
Download the dataset from here and then reformat the file using:
python main_scripts/msls/1_reformat_mapillary.py  original_MSLS/folder/path destination/folder/path
python main_scripts/msls/2_reformat_testset_msls.py  reformatted/MSLS/path
Oxford RobotCar
In our experiments, we used the following laps of Oxford RobotCar as train/validation/test sets:
  • Train set:
    • queries: lap 2014-12-17-18-18-43 (winter night, rain);
    • database: lap 2014-12-16-09-14-09 (winter day, sun);
  • Validation set:
    • queries: lap 2015-02-03-08-45-10 (winter day, snow);
    • database: lap 2015-11-13-10-28-08 (fall day, overcast).
  • Test set :
    • queries: lap 2014-12-16-18-44-24 (winter night);
    • database: lap 2014-11-18-13-20-12 (fall day).

We provide the 2 pre-processed versions that we used in our experiments:

  • Fixed-Space sampling, keeping one frame every 2 meters: link
  • Fixed-Time sampling, keeping one frame every 3.6 seconds: link

The first one is more consistent with the MSLS setup. For the second one, the choice of the 3.6 seconds threshold was made to keep a comparable number of images with the first version.

Alternatively, you can download the full,raw version of the dataset from the official website it and preprocess the dataset use the following commands:

python main_scripts/robotcar/1_downloader.py
python main_scripts/robotcar/2_untar.py
python main_scripts/robotcar/3_dataset_builder_all.py
python main_scripts/robotcar/4_reduce_density.py
python main_scripts/robotcar/5_format_tree.py

Model zoo

We are currently exploring hosting options, so this is a partial list of models. More models will be added soon!! If you need any particular model feel free to open an issue and we will provide it

Pretrained models with SeqVLAD and different backbones
Pretained networks employing different backbones.

Model Training on MSLS, seq len 5
MSLS (R@1) Download
CCT384 + SeqVLAD 89.6 [Link]

Run Experiments

Once the datasets are ready, we can proceed running the experiments with the architecture of choice.

NB: to build MSLS sequences, some heavy pre-processing to build data structures is needed. The dataset class will automatically cache this, so to compute them only the first time. Therefore the first experiment that you ever launch will take 2-3 hours to build this structures which will be saved in a cache directory, and following experiments will then start quickly. Note that this procedure caches everything with relative paths, therefore if you want to run experiments on multiple machines you can simply copy the cache directory. Finally, note that this data structures must be computed for each sequence length, so potentially in cache you will have a file for each sequence_length that you want to experiment with.

TODO one for each family of methods

Example with CCT-384 + SeqVLAD on MSLS:

python main_scripts/main_train.py \
	--dataset_path <MSLS path>
	--img_shape 384 384 \
	--arch cct384 --aggregation seqvlad \
	--trunc_te 8 --freeze_te 1 \
	--train_batch_size 4 --nNeg 5 --seq_length 5 \
	--optim adam --lr 0.00001

Example with TimeSformer:

python main_scripts/main_train.py \
	--dataset_path <MSLS path>
	--img_shape 224 224 \
	--arch timesformer --aggregation _ \
	--train_batch_size 4 --nNeg 5 --seq_length 5 \
	--optim adam --lr 0.00001

Example with ResNet-18 + GeM + CAT :

python main_scripts/main_train.py \
	--dataset_path <MSLS path>
	--img_shape 480 640 \
	--arch r18l3 --pooling gem --aggregation cat \
	--train_batch_size 4 --nNeg 5 --seq_length 5 \
	--optim adam --lr 0.00001

Experiments on Robotcar

For experiments on Robotcar, we did not change any hyperparameters wrt experiments on MSLS. Thus you can simply select the configuration of backbone-pooling-aggregation that you want, like in the examples above, and then replace: --dataset MSLS path with --dataset <Robotcar path> Follow the instructions above to download the dataset

Add PCA

To add the PCA to SeqVLAD or CAT models use:

python main_scripts/evaluation.py \
	--pca_outdim <descr. dim.> \
	--resume <path trained model w/o PCA> 

where the parameter --pca_outdim determines the final descriptor dimensionality (in our test we used 4096)

Evaluate trained models

It is possible to evaluate the trained models using:

python main_scripts/evaluation.py \
	--resume <path trained model>

Other related Projects

Deep Visual Geo-Localization Benchmark

Resources used in this work

Official SeqNet implementation

Official SeqMatchNet implementation

CCT repository

Cite

Here is the bibtex to cite our paper

@article{Mereu_2022_seqvlad,
  author={Mereu, Riccardo and Trivigno, Gabriele and Berton, Gabriele and Masone, Carlo and Caputo, Barbara},
  journal={IEEE Robotics and Automation Letters},
  title={Learning Sequential Descriptors for Sequence-Based Visual Place Recognition}, 
  year={2022},
  volume={7},
  number={4},
  pages={10383-10390},
  doi={10.1109/LRA.2022.3194310}
}