Nesterov flag
rilshok opened this issue · 3 comments
rilshok commented
Why does the Nesterov flag not affect the result of calculations?
/training/solver.go:
// Update returns the update for a given weight
func (o *SGD) Update(value, gradient float64, iteration, idx int) float64 {
lr := o.lr / (1 + o.decay*float64(iteration))
o.moments[idx] = o.momentum*o.moments[idx] - lr*gradient
if o.nesterov {
o.moments[idx] = o.momentum*o.moments[idx] - lr*gradient
}
return o.moments[idx]
}
rilshok commented
Also, for each weight, the learning rate is recalculated. Why not take the lr recount beyond the update function
patrikeh commented
I'm not following, it seems to be affecting the results and the calculation looks ok to me? Fair point for the LR calculation though.
rilshok commented
I was very confused that the line under if repeats the line above if. But now I understand that o.moments[idx] has changed, hence the meaning will be different. but I have not yet analyzed whether this works correctly