Multiply distances by 10 before NCA loss?

I really enjoyed the paper, especially the visualizations that accompanied your argument. You made a pretty persuasive case about the dynamics of backprop on hard negative triplets.

I'm curious why, looking at the SCTLoss implementation, you multiply the distances by 10 here before the NCA loss; what was behind validating this scalar? I imagine it has something to do with putting the two loss functions on an even footing, but I can't quite see it.

HardNegative/_code/Loss.py

Line 104 in f5f3923

    
           loss_easytriplet = -F.log_softmax(Triplet_val[EasyTripletMask,:]/0.1, dim=1)[:,0].sum()

Also, how did you decide on the 0.8 threshold for cosine similarity in your "selection" function? What tends to happen if you leave this out?

HardNegative/_code/Loss.py

Lines 87 to 88 in f5f3923

    
           HardTripletMask = ((Neg>Pos) | (Neg>0.8)) & Mask_valid 
        
           EasyTripletMask = ((Neg<Pos) & (Neg<0.8)) & Mask_valid

Thanks—this is a great paper.

Ah, of course, that's the temperature parameter!

	HardTripletMask = ((Neg>Pos) \| (Neg>0.8)) & Mask_valid
	EasyTripletMask = ((Neg<Pos) & (Neg<0.8)) & Mask_valid