/Multimodal-VPR

IEEE Signal Processing Letters 2021// Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

Primary LanguagePython

Multi-modal Visual Place Recognition

Dataset and code for our IEEE-Signal-Processing-Letter-2020 paper: 《Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

Highlight

  • We build a dynamics-invariant perception space to improve feature matching in dynamic environments, which can be seen as an extension of Empty Cities (IEEE TRO2020).

  • Specifically, we propose a dynamics-invariant perception network to recover the static semantics and static images directly from the dynamic frames. We then design a multi-modal coding strategy to generate the robust semantic-visual features for image matching.

Framework

Installation

  • Install Torch (tested on 1.2.0)

Dataset

  • Synthetic dataset for dynamic-to-static semantic segmentation [link] (Extraction-code: 5250).
  • Synthetic dataset for dynamic-to-static image translation [link].

Train

python train_TransNet --dataroot EmptycitiesDataset_path --gpu_ids 0 --name xyz --batchSize x --phase train

Test

python test_TransNet --gpu_ids 0 --name SegTransNet --phase testRand --epoch x --no_flip

Scripts

  • We provide training and testing scripts for static semantic segmentation.

    python train_SegNet --gpu_ids 0 --name xyz --batchSize x --phase trainRand --mode Seg
    python test_SegNet --gpu_ids 0 --name xyz --epoch x --phase testRand --mode Seg
  • We also provide evaluation scripts for image quality (L1, L2, PSNR, SSIM) and semantic segmentation performance (PA, MPA, MIoU, FWIoU). Please see "src/scripts" for more details.

Citation

BibTex:

@ARTICLE{9594697,
  author={Wu, Lin and Wang, Teng and Sun, Changyin},
  journal={IEEE Signal Processing Letters}, 
  title={Multi-Modal Visual Place Recognition in Dynamics-Invariant Perception Space}, 
  year={2021},
  volume={28},
  pages={2197-2201},
  doi={10.1109/LSP.2021.3123907}}

Acknowledge

  • This work was supported by in part by the Postgraduate Research and Practice Innovation Program of Jiangsu Province under Grant SJCX20_0035, in part by the Fundamental Research Funds for the Central Universities under Grant 3208002102D, and in part by the National Natural Science Foundation of China under Grant 61803084.