gradient needs to be divided by batch size
Closed this issue · 3 comments
samxuxiang commented
It appears that gradient is not being divided by batch size in CenterlossFunc()
I change it to:
@staticmethod
def forward(ctx, feature, label, centers):
ctx.save_for_backward(feature, label, centers)
centers_batch = centers.index_select(0, label.long())
return (feature - centers_batch).pow(2).sum() / 2.0 / feature.size()[0]
@staticmethod
def backward(ctx, grad_output):
feature, label, centers = ctx.saved_tensors
centers_batch = centers.index_select(0, label.long())
diff = centers_batch - feature
# init every iteration
counts = centers.new(centers.size(0)).fill_(1)
ones = centers.new(label.size(0)).fill_(1)
grad_centers = centers.new(centers.size()).fill_(0)
counts = counts.scatter_add_(0, label.long(), ones)
grad_centers.scatter_add_(0, label.unsqueeze(1).expand(feature.size()).long(), diff)
grad_centers = grad_centers/(counts.view(-1, 1))
return - grad_output.data * diff / feature.size()[0], None, grad_centers
Figure now looks like:
jxgu1016 commented
May be your are right... The same operation is also in other loss functions like cross_entropy_loss
.
jxgu1016 commented
Could you pull a request?
BTW, I prefer feature.size(0)
to feature.size()[0]
jxgu1016 commented
fixed.