About autograd

Question

About autograd

zorrocai opened this issue 6 years ago · 5 comments

Hey~, I noticed that you have written center loss with backward designed by yourself. What would happen if define the forward function only,and use autograd in pytorch？I wonder whether there lies any differences between them.

Answer 1 · 2018-06-05T14:33:54.000Z

Actually, the backward function is not designed by me but by the author of center loss according to the equation 4 in the paper.
The difference may be negligible, I think.

Answer 2 · 2018-06-05T14:48:18.000Z

Thanks for reply, so quickly~.
Actually, I mean that you write the backward in the class Centerloss.
And I do some changes on the code,without the self-defined backward function :

class CenterLoss(nn.Module):
def init(self, num_classes, feat_dim ):
super(CenterLoss, self).init()
self.centers = nn.Parameter(torch.randn(num_classes, feat_dim))
#self.centerlossfunc = CenterlossFunc.apply
self.feat_dim = feat_dim
def forward(self, label, feat):
batch_size = feat.size(0)
feat = feat.view(batch_size, -1)
centers_batch = self.centers.index_select(0, label.long())
return (feat - centers_batch).pow(2).sum() / 2.0

Guess what happened?
the loss is NAN, It seems that the derivative function given by the author is important somehow.

Training... Epoch = 1
tensor(779.4907, device='cuda:0')
tensor(24004.1328, device='cuda:0')
tensor(1.00000e+05 *
6.8511, device='cuda:0')
tensor(2.3724e+07, device='cuda:0')
tensor(6.8685e+08, device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')
tensor(nan., device='cuda:0')

Answer 3 · 2018-06-05T15:30:36.000Z

It's weird and I believe it should work.
You may check my old version of center loss.

Answer 4 · 2018-06-06T03:05:18.000Z

Well, It seems that in your
https://github.com/jxgu1016/MNIST_center_loss_pytorch/blob/dbeea5380de8a3c6b1b3b3f2c411b980e143dd87/CenterLoss.py
instead of using $L_{c} = \frac{1}{2}\sum_{i=1}^{m}\left \| x_{i}-C_{y_{i}} \right \|_{2}^{2}$ from the original paper, you chose to use $L_{c} = \frac{1}{2}\sum_{i=1}^{m}\frac{1}{N_{y_{i}}}\left \| x_{i}-C_{y_{i}} \right \|_{2}^{2}$
What a brilliant change it is!

Answer 5 · 2018-06-06T06:24:58.000Z

There still existed problems of that version so here comes the new version. More details are illustrated in the readme.