/DeepHomography

Content-Aware Unsupervised Deep Homography Estimation

Primary LanguagePythonMIT LicenseMIT

Content-Aware Unsupervised Deep Homography Estimation paper

Homography estimation is a basic image alignment method in many applications. It is usually done by extracting and matching sparse feature points, which are error-prone in low-light and low-texture images. On the other hand, previous deep homography approaches use either synthetic images for supervised learning or aerial images for unsupervised learning, both ignoring the importance of handling depth disparities and moving objects in real world applications. To overcome these problems, in this work we propose an unsupervised deep homography method with a new architecture design. In the spirit of the RANSAC procedure in traditional methods, we specifically learn an outlier mask to only select reliable regions for homography estimation. We calculate loss with respect to our learned deep features instead of directly comparing image content as did previously. To achieve the unsupervised training, we also formulate a novel triplet loss customized for our network. We valid our method by conducting comprehensive comparisons on a new dataset that covers a wide range of scenes with varying degrees of difficulties for the task. Experimental results reveal that our method outperforms the state-of-the-art including deep solutions and feature-based solutions. introduction

Scores

RE LT LL SF LF Avg
Coordinate 1.81 1.90 1.94 1.75 1.72 1.82
Coordinate-v2 0.73 1.01 1.03 0.92 0.70 0.88

Installation

Requirements

  • Python 3.6
  • Pytorch 1.0.1 (1.2.0)
  • torchvision 0.2.2
  • tensorboardX 1.9
git clone https://github.com/JirongZhang/DeepHomography.git
cd DeepHomography

Data pre-processing

  1. Download raw data
# GoogleDriver
https://drive.google.com/file/d/19d2ylBUPcMQBb_MNBBGl9rCAS7SU-oGm/view?usp=sharing
# BaiduYun
https://pan.baidu.com/s/1Dkmz4MEzMtBx-T7nG0ORqA (key: gvor)
  1. Data processing
  • Put "models/Coordinate/Train/Test" in the corresponding folder
python video2img.py

Train

​Our model is designed for small baseline of real data. Here, we provide "Oneline" model which predicts H_ab directly. It also uses triplet loss to optimize the network. It can produce almost comparable performance and much easier to optimize. So, we use this version for now. Thanks to @Daniel for the accurate loss function. The formula can be simplified as:

  1. Oneline train from scrach
python train.py --gpus 2 --cpus 8 --lr 0.0001 --batch_size 32
  1. Oneline two-stage version

Please set the mask to all ones at the begining using (details in line 277-281 of resnet.py).

python train.py --gpus 2 --cpus 8 --lr 0.0001 --batch_size 32

With stable features have been trained from the feature extractor, i.e. At least 2 epochs, then finetuned the network with mask predictor involved, with a small learning rate.

python train.py --gpus 2 --cpus 8 --lr 0.000064 --batch_size 32 --finetune True

If you want to try "Doubleline" version, please add another half of the loss and using getBatchHLoss() which in utils.py to add H loss. If you have any questions, please contact us.

Test

python test.py

Release History

  • 2021.3.14
    • The more accurate coordinates of testset are released. It uses traditional feature descriptors for pre-matching, which greatly reduces the error caused by pure manual marking.
  • 2020.8.4
    • Sorry for waiting. We have uploaded codes and dataset. Please read our final version of paper, which complements more discussion.
  • 2020.7.3
    • Our paper has been accepted by ECCV2020 as oral presentation.
  • 2019.11.22
    • We will upload the codes&model after this paper has been accepted.
  • 2019.9.12

Meta

ZHANG Jirong – zhangjirong.dgt@gmail.com or zhangjirong@std.uestc.edu.cn

All code is provided for research purposes only and without any warranty. Any commercial use requires our consent. If you use this code or ideas from the paper for your research, please cite our paper:

@inproceedings{zhang2020content,
  title={Content-aware unsupervised deep homography estimation},
  author={Zhang, Jirong and Wang, Chuan and Liu, Shuaicheng and Jia, Lanpeng and Ye, Nianjin and Wang, Jue and Zhou, Ji and Sun, Jian},
  booktitle={European Conference on Computer Vision},
  pages={653--669},
  year={2020},
  organization={Springer}
}

References

[1] T. Nguyen, S. W. Chen, S. S. Shivakumar, C. J. Taylor, and V. Kumar. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics and Automation Letters, 3(3):2346–2353, 2018
[2] D. DeTone, T. Malisiewicz, and A. Rabinovich. Deep image homography estimation. arXiv preprint arXiv:1606.03798, 2016