littleredxh/HardNegative

Multiply distances by 10 before NCA loss?

reppertj opened this issue · 1 comments

I really enjoyed the paper, especially the visualizations that accompanied your argument. You made a pretty persuasive case about the dynamics of backprop on hard negative triplets.

I'm curious why, looking at the SCTLoss implementation, you multiply the distances by 10 here before the NCA loss; what was behind validating this scalar? I imagine it has something to do with putting the two loss functions on an even footing, but I can't quite see it.

loss_easytriplet = -F.log_softmax(Triplet_val[EasyTripletMask,:]/0.1, dim=1)[:,0].sum()

Also, how did you decide on the 0.8 threshold for cosine similarity in your "selection" function? What tends to happen if you leave this out?

HardTripletMask = ((Neg>Pos) | (Neg>0.8)) & Mask_valid
EasyTripletMask = ((Neg<Pos) & (Neg<0.8)) & Mask_valid

Thanks—this is a great paper.

Ah, of course, that's the temperature parameter!