Lightweight CNN Backbone following 3 Feature Extractor:
- EfficientNet
- MobileNetV2
- Knowledge-Distillation
- Teacher network: ResNet 50
- Student network: ResNet18
Backbone | Output dimension |
Pitts30k-test | MSLS-val | DOWNLOAD |
||||
---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |||
ResNet50 (MixVPR) | 4096 | 91.99 | 95.79 | 96.66 | 87.03 | 92.57 | 94.73 | LINK |
EfficientNet_b0 | 128 | 81.51 | 91.83 | 94.06 | 65.00 | 78.38 | 81.89 | LINK |
MobileNet v2 | 2048 | 86.30 | 92.99 | 94.66 | 70.14 | 81.35 | 83.65 | LINK |
KD | 1024 | 89.00 | 94.72 | 96.07 | 81.49 | 88.92 | 91.22 | LINK |
Train MobileNetv2 & EfficientNet & ResNet
#Mobile Net
backbone_arch='mobilenet_v2',
pretrained=True,
layers_to_freeze=2,
layers_to_crop=[4], # 4 crops the last resnet layer, 3 crops the 3rd, ...etc
agg_arch='MixVPR',
agg_config={'in_channels' : 512,
'in_h' : 20,
'in_w' : 20,
'out_channels' : 256,
'mix_depth' : 4,
'mlp_ratio' : 1,
'out_rows' : 4}
#EfficientNet_b0
backbone_arch='efficientnet_b0',
pretrained=True,
layers_to_freeze=2,
layers_to_crop=[4], # 4 crops the last resnet layer, 3 crops the 3rd, ...etc
agg_arch='MixVPR',
agg_config={'in_channels' : 1280,
'in_h' : 10,
'in_w' : 10,
'out_channels' : 32,
'mix_depth' : 4,
'mlp_ratio' : 1,
'out_rows' : 4},
python main.py
Train Knowledge-Distillation
python kd_train.py
Code to load the pretrained weights is as follows:
from main import VPRModel
# Note that images must be resized to 320x320
model = VPRModel(backbone_arch='resnet50',
layers_to_crop=[4],
agg_arch='MixVPR',
agg_config={'in_channels' : 1024,
'in_h' : 20,
'in_w' : 20,
'out_channels' : 1024,
'mix_depth' : 4,
'mlp_ratio' : 1,
'out_rows' : 4},
)
state_dict = torch.load('./LOGS/resnet50_MixVPR_4096_channels(1024)_rows(4).ckpt')
model.load_state_dict(state_dict)
model.eval()
code to rune Demo (demo_query.py)
python demo_query.py --model [backbone model] --query [path to query] --database [path to database]
@inproceedings{ali2023mixvpr,
title={MixVPR: Feature Mixing for Visual Place Recognition},
author={Ali-bey, Amar and Chaib-draa, Brahim and Gigu{\`e}re, Philippe},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2998--3007},
year={2023}
}