Nesterov flag

Question

Nesterov flag

rilshok opened this issue 5 years ago · 3 comments

Why does the Nesterov flag not affect the result of calculations?
/training/solver.go:

// Update returns the update for a given weight
func (o *SGD) Update(value, gradient float64, iteration, idx int) float64 {
	lr := o.lr / (1 + o.decay*float64(iteration))

	o.moments[idx] = o.momentum*o.moments[idx] - lr*gradient

	if o.nesterov {
		o.moments[idx] = o.momentum*o.moments[idx] - lr*gradient
	}

	return o.moments[idx]
}

Answer 1 · 2019-12-08T12:34:49.000Z

Also, for each weight, the learning rate is recalculated. Why not take the lr recount beyond the update function

Answer 2 · 2019-12-08T23:18:43.000Z

I'm not following, it seems to be affecting the results and the calculation looks ok to me? Fair point for the LR calculation though.

Answer 3 · 2019-12-09T04:10:41.000Z

I was very confused that the line under if repeats the line above if. But now I understand that o.moments[idx] has changed, hence the meaning will be different. but I have not yet analyzed whether this works correctly