mihaidusmanu/d2-net

what is the difference between "d2_ots.pth","d2_tf.pth",and "d2_tf_no_phototourism.pth"?

Closed this issue · 2 comments

Dear Mihai, first of all thanks for your great work.
1.what is the difference between "d2_ots.pth","d2_tf.pth",and "d2_tf_no_phototourism.pth"?
2.Can you open source D2Net based on the ResNet model?

Thank you!

Hello. d2_ots.pth is the Caffe ImageNet pretrained model (off-the-shelf). d2_tf.pth is the fine-tuned version of d2_ots.pth on the full MegaDepth dataset using the loss from our paper. d2_tf_no_phototourism.pth is also a fine-tuned version of d2_ots.pth on MegaDepth without the common scenes with PhotoTourism.

As we mentioned in the paper, vanilla ResNets start by rapidly down-sampling the input image which makes them less adequate for this application. Thus we did not investigate them further and we do not have any trained ResNet weights to release. In case you want to use the pretrained weights, then you will have to replace the test model (see below) with a subset of ResNet layers from torchvision.models.ResNet**.

d2-net/lib/model_test.py

Lines 10 to 33 in 8198366

self.model = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, stride=2),
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, stride=2),
nn.Conv2d(128, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.AvgPool2d(2, stride=1),
nn.Conv2d(256, 512, 3, padding=2, dilation=2),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, padding=2, dilation=2),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, padding=2, dilation=2),
)

thank you !