jxgu1016/MNIST_center_loss_pytorch

gradient needs to be divided by batch size

Closed this issue · 3 comments

It appears that gradient is not being divided by batch size in CenterlossFunc()

I change it to:

@staticmethod
def forward(ctx, feature, label, centers):
    ctx.save_for_backward(feature, label, centers)
    centers_batch = centers.index_select(0, label.long())
    return (feature - centers_batch).pow(2).sum() / 2.0 / feature.size()[0]


@staticmethod
def backward(ctx, grad_output):
    feature, label, centers = ctx.saved_tensors
    centers_batch = centers.index_select(0, label.long())
    diff = centers_batch - feature
    # init every iteration
    counts = centers.new(centers.size(0)).fill_(1)
    ones = centers.new(label.size(0)).fill_(1)
    grad_centers = centers.new(centers.size()).fill_(0)

    counts = counts.scatter_add_(0, label.long(), ones)
    grad_centers.scatter_add_(0, label.unsqueeze(1).expand(feature.size()).long(), diff)
    grad_centers = grad_centers/(counts.view(-1, 1))
    return - grad_output.data * diff / feature.size()[0], None, grad_centers

Figure now looks like:

epoch 90

May be your are right... The same operation is also in other loss functions like cross_entropy_loss .

Could you pull a request?
BTW, I prefer feature.size(0) to feature.size()[0]

fixed.