Multiply distances by 10 before NCA loss?
reppertj opened this issue · 1 comments
I really enjoyed the paper, especially the visualizations that accompanied your argument. You made a pretty persuasive case about the dynamics of backprop on hard negative triplets.
I'm curious why, looking at the SCTLoss implementation, you multiply the distances by 10 here before the NCA loss; what was behind validating this scalar? I imagine it has something to do with putting the two loss functions on an even footing, but I can't quite see it.
Line 104 in f5f3923
Also, how did you decide on the 0.8 threshold for cosine similarity in your "selection" function? What tends to happen if you leave this out?
Lines 87 to 88 in f5f3923
Thanks—this is a great paper.
Ah, of course, that's the temperature parameter!