Viewpoint Invariant Dense Matching for Visual Geolocalization: Official PyTorch implementation

This is the official implementation of the ICCV 2021 paper:

G Berton, C. Masone, V. Paolicelli and B. Caputo, Viewpoint Invariant Dense Matching for Visual Geolocalization

[ICCV OpenAccess] [ArXiv] [Video] [BibTex]

Setup

First download the baseline models which have been trained following the training procedure in the NetVLAD paper. We provide a script to download the six models used, which are a combination of 3 backbone encoders (AlexNet, VGG-16 and ResNet-50) with 2 pooling/aggregation layers (GeM and NetVLAD). The models are automatically saved in data/pretrained_baselines.

python download_pretrained_baselines.py

Then you should prepare your geo-localization dataset, so that the directory tree is as such:

dataset_name
└── images
    ├── train
    │   ├── gallery
    │   └── queries
    ├── val
    │   ├── gallery
    │   └── queries
    └── test
        ├── gallery
        └── queries

and the images are named as @UTM east@UTM north@whatever@.jpg

Dependencies

See requirements.txt

Training

You can train the model using the train.py, here's an example with the lightest/fastest model (i.e. AlexNet + GeM):

python train.py --arch alexnet --pooling gem --resume_fe data/pretrained_baselines/alexnet_gem.pth

For a full set of options, and explanation of the parameters, run python train.py -h. The script will create a folder under ./runs/default/YYYY-MM-DD_HH-mm-ss where logs and checkpoints will be saved. At the end of the training you will see the results with the baseline model, as well as when re-ranking is applied using GeoWarp.

Evaluation

You can use this code to compute the results with our trained models. To reproduce the results from the paper, you can download our models simply running

python download_trained_hom_reg.py

which will automatically download the models and save them under data/trained_homography_regressions. Then to obtain the results you can execute

python eval.py --arch alexnet --pooling gem --resume_fe data/pretrained_baselines/alexnet_gem.pth --resume_hr data/trained_homography_regressions/alexnet_gem.pth

This will give you the exact same results as in Table 1 of the paper. For a full set of options, and explanation of the parameters, run python eval.py -h.

Visualization of self-supervised data

You can generate and visualize self-supervised data given a single image, simply running

python visualize_ss_data.py --image_path data/example.jpg --k 0.8

The script generates four images (notation is consistent with the paper):

./data/ss_img_source.jpg: the source image I, with the visualization of the two quadrilaterals t_x (orange) and t_y (purple) and their intersection t_z (green) as defined in the paper;
./data/ss_proj_a.jpg: the first projection I_a, with the projection t_a of the intersection (green);
./data/ss_proj_b.jpg: the second projection I_b, with the projection t_b of the intersection (green);
./data/ss_proj_intersection.jpg: the projection of the intersection.

You can change the value of k to see how this influences the training data.

Example of randomly generated images:

Source image	Projection A	Projection B	Projected intersection

BibTeX

If you use this code in your project, please cite us using:

@InProceedings{Berton_ICCV_2021,
    author    = {Berton, Gabriele and Masone, Carlo and Paolicelli, Valerio and Caputo, Barbara},
    title     = {Viewpoint Invariant Dense Matching for Visual Geolocalization},
    booktitle = ICCV,
    month     = {October},
    year      = {2021},
    pages     = {12169-12178}
}