Whether to use ReLU for the last layer

Hi, In the supplementary material of your paper, I found the following statement:

We noticed that ReLU has a significant negative impact on the off-the- shelf descriptors, but not on the fine-tuned ones. Thus, we report results without ReLU for the off-the-shelf model and with ReLU for the fine-tuned one.

So the meaning is that for the off-the-shelf model, you use the feature before ReLU to calculate the detection score, while for the fine-tuned stage, you use the feature after ReLU to calculate the score and train the model? Is my understanding right ?

And could your please offer some insight on why this ReLU layer matters ? I think if no ReLU is used, the score might be less than 0 ( if a pixel has a feature vector with all elements less that zero)

Thanks.

Hello. The paragraph you quoted only refers to the descriptors (and not the hard / soft detection). As we mentioned, for the off-the-shelf descriptors, considering the descriptors before ReLU yields better matches. In the fine-tuned version, there's almost no difference between ReLU / no-ReLU in terms of matching performance - we suspect that this is due to the fact that the descriptor performance already satured (with the current setup - architecture / loss function / training data).

For the hard detection, ReLU / no-ReLU didn't make almost any difference in either case. As you suspected, at train time (soft detection), we used ReLU in order to avoid negative terms in the weighted average (see below the link to the code). During the changes for the Camera Ready version, I somehow forgot to mention this important detail - I will add this in the next version of the arXiv paper. Sorry for the inconvenience.

d2-net/lib/model.py

Line 60 in 9696418

batch = F.relu(batch)

Thanks a lot for your explaination !