
Register two images from different viewpoints.

Primary LanguageJupyter Notebook

Image Feature Matching

"For most of us, our best camera is part of the phone in our pocket. We may take a snap of a landmark, like the Trevi Fountain in Rome, and share it with friends. By itself, that photo is two-dimensional and only includes the perspective of our shooting location. Of course, a lot of people have taken photos of that fountain. Together, we may be able to create a more complete, three-dimensional view. What if machine learning could help better capture the richness of the world using the vast amounts of unstructured image collections freely available on the internet?" quote

Data Description

  • data
  • download.sh to setup data (require ~/.kaggle/kaggle.json)


  • Baseline: Notebook


Due to the restriction of GPU resource, therefore only using pretrained-models method is experimented. All pretraining has been done for outdoor matching using MegaDepth dataset.

  1. LoFTR: Detector-Free Local Feature Matching with Transformers
  • Self and Cross Attention layers in Transformers are applied to obtain feature descriptors

Architecture from: https://zju3dv.github.io/loftr/

  1. QuadTree Attention: LoFTR-based with QuadTree Attention, which reduces the computational complexity from quadratic to linear.
  • At each level, the top K patches with the highest attention scores are selected, such that at the next level, attention is only evaluated within the relevant regions corresponding to these top K patches.

Architecture from: https://arxiv.org/pdf/2201.02767.pdf

  1. SuperGlue Network: Graph Neural Network combined with an Optimal Matching layer that is trained to perform matching on two sets of sparse image features.
  • SuperGlue operates as a "middle-end," performing context aggregation, matching, and filtering in a single end-to-end architecture.

Architecture from: https://arxiv.org/pdf/1911.11763.pdf

  1. DKM: Deep Kernelized Dense Geometric Matching
  • Perform global fixed-size correlation, followed by flattening and convolution to predict correspondences. Paper

Result Examples

LoFTR with concatenated correspondence points from dual-softmax and optimal-transport pretrained-models on outdoor scenes

LoFTR with correspondence points of augmented pair images from dual-softmax pretrained-model on outdoor scenes

LoFTR with QuadTree Attention pretrained-models on outdoor scenes


Code adapted and used from:


  title={{LoFTR}: Detector-Free Local Feature Matching with Transformers},
  author={Sun, Jiaming and Shen, Zehong and Wang, Yuang and Bao, Hujun and Zhou, Xiaowei},

  author    = {Paul-Edouard Sarlin and
               Daniel DeTone and
               Tomasz Malisiewicz and
               Andrew Rabinovich},
  title     = {{SuperGlue}: Learning Feature Matching with Graph Neural Networks},
  booktitle = {CVPR},
  year      = {2020},
  url       = {https://arxiv.org/abs/1911.11763}

  title={QuadTree Attention for Vision Transformers},
  author={Tang, Shitao and Zhang, Jiahui and Zhu, Siyu and Tan, Ping},

  title={Deep Kernelized Dense Geometric Matching},
  author={Edstedt, Johan and Wadenb{\"a}ck, M{\aa}rten and Felsberg, Michael},
  journal={arXiv preprint arXiv:2202.00667},

    title={A case for using rotation invariant features in state of the art feature matchers},
    author={B\"okman, Georg and Kahl, Fredrik},

    title={{General E(2)-Equivariant Steerable CNNs}},
    author={Weiler, Maurice and Cesa, Gabriele},
    booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
