This repository contains the implementation of the following paper:
"D2-Net: A Trainable CNN for Joint Detection and Description of Local Features".
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. CVPR 2019.
Python 3.6+ is recommended for running our code. Conda can be used to install the required packages:
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
conda install h5py imageio imagesize matplotlib numpy scipy tqdm
The off-the-shelf Caffe VGG16 weights and their tuned counterpart can be downloaded by running:
mkdir models
wget https://dsmn.ml/files/d2-net/d2_ots.pth -O models/d2_ots.pth
wget https://dsmn.ml/files/d2-net/d2_tf.pth -O models/d2_tf.pth
wget https://dsmn.ml/files/d2-net/d2_tf_no_phototourism.pth -O models/d2_tf_no_phototourism.pth
Update - 23 May 2019 We have added a new set of weights trained on MegaDepth without the PhotoTourism scenes (sagrada_familia - 0019, lincoln_memorial_statue - 0021, british_museum - 0024, london_bridge - 0025, us_capitol - 0078, mount_rushmore - 1589). Our initial results show similar performance. In order to use these weights at test time, you should add --model_file models/d2_tf_no_phototourism.pth
.
extract_features.py
can be used to extract D2 features for a given list of images. The singlescale features require less than 6GB of VRAM for 1200x1600 images. The --multiscale
flag can be used to extract multiscale features - for this, we recommend at least 16GB of VRAM.
The output format can be either npz
or mat
. In either case, the feature files encapsulate two arrays:
keypoints
[N x 3
] array containing the positions of keypointsx, y
and the scaless
. The positions follow the COLMAP format, with theX
axis pointing to the right and theY
axis to the bottom.scores
[N
] array containing the activations of keypoints (higher is better).descriptors
[N x 512
] array containing the L2 normalized descriptors.
python extract_features.py --image_list_file images.txt (--multiscale)
The training pipeline provided here is a PyTorch implementation of the TensorFlow code that was used to train the model available to download above.
Update - 05 June 2019 We have fixed a bug in the dataset preprocessing - retraining now yields similar results to the original TensorFlow implementation.
After downloading the entire MegaDepth dataset (including SfM models), preprocess_megadepth.sh
can be used to retrieve the camera parameters and compute the overlap between images for all scenes.
cd megadepth_utils
bash preprocess_megadepth.sh /local/dataset/megadepth /local/dataset/megadepth/scenes_info
After downloading and preprocessing MegaDepth, the training can be started right away:
bash prepare_for_training.sh
python train.py --use_validation --dataset_path /local/dataset/megadepth --scene_info_path /local/dataset/megadepth/scene_info
If you use this code in your project, please cite the following paper:
@InProceedings{Dusmanu2019CVPR,
author = {Dusmanu, Mihai and Rocco, Ignacio and Pajdla, Tomas and Pollefeys, Marc and Sivic, Josef and Torii, Akihiko and Sattler, Torsten},
title = {{D2-Net: A Trainable CNN for Joint Detection and Description of Local Features}},
booktitle = {Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2019},
}