The annealing optimization strategy for A-Softmax loss
Closed this issue · 2 comments
taey16 commented
Thanks for your nice repo.
I'm trying to your codes.
My question is
the paper said about the annealing optimization strategy for A-Softmax loss with introducing lambda.
here, your implementation is
self.lamb = max(self.LambdaMin,self.LambdaMax/(1+0.1*self.it ))
output = cos_theta * 1.0
output[index] -= cos_theta[index]*(1.0+0)/(1+self.lamb)
output[index] += phi_theta[index]*(1.0+0)/(1+self.lamb)
but, i think the cos term is to be scaled by a factor of lambda such that
output = cos_theta * self.lamb
output[index] -= cos_theta[index]*(self.lamb)/(1+self.lamb)
output[index] += phi_theta[index]*(1.0)/(1+self.lamb)
Please, give me your idea
Thanks