mihaidusmanu/d2-net

Question about preparing the training data.

XuyangBai opened this issue · 2 comments

Hi, Thanks for your sharing. I have a question about your paper. In 4.3 part Implementation details, you mentioned that

For each pair, we selected a random 256 × 256 crop centered around one correspondence. We use a batch size of 1 and make sure that the training pairs present at least 128 correspondences in order to obtain meaningful gradients.

In my understanding, your model needs pairs of images and pixel-wise correspondences as input. And then you densely extract the feature vector for each pixel and calculate the loss based on the correspondence information. Then why you need to crop the images insteaded of use the origin image? Is is because the memory issue ? and what do you mean by a random 256 * 256 crop centered around one correspondence? around which correspondence?

After that, do you choose a fixed number of pixel-wise correspondence as your positive pair or use all of them? and how can you guarantee this two cropped images have at least 128 correspondence pixel pairs ?

Thanks a lot !

Hello. Please find the answers inlined below.

Then why you need to crop the images insteaded of use the origin image?

You are right, we are using crops due to memory reasons mainly. Another reason for using crops is that training images have a wide range of resolutions.

and what do you mean by a random 256 * 256 crop centered around one correspondence? around which correspondence?

The pipeline for dataset selection first picks a correspondence from the sparse 3D model and then crops a 256x256 area around it.

After that, do you choose a fixed number of pixel-wise correspondence as your positive pair or use all of them? and how can you guarantee this two cropped images have at least 128 correspondence pixel pairs ?

We use all pixel-wise correspondences for computing the loss. Since we are using a batch size of 1, we simply skip the image pairs with fewer than 128 correspondences.

d2-net/lib/loss.py

Lines 75 to 76 in be09ec7

if ids.size(0) < 128:
continue

Thanks a lot !