patrikeh/go-deep

Nesterov flag

rilshok opened this issue · 3 comments

Why does the Nesterov flag not affect the result of calculations?
/training/solver.go:

// Update returns the update for a given weight
func (o *SGD) Update(value, gradient float64, iteration, idx int) float64 {
	lr := o.lr / (1 + o.decay*float64(iteration))

	o.moments[idx] = o.momentum*o.moments[idx] - lr*gradient

	if o.nesterov {
		o.moments[idx] = o.momentum*o.moments[idx] - lr*gradient
	}

	return o.moments[idx]
}

Also, for each weight, the learning rate is recalculated. Why not take the lr recount beyond the update function

I'm not following, it seems to be affecting the results and the calculation looks ok to me? Fair point for the LR calculation though.

I was very confused that the line under if repeats the line above if. But now I understand that o.moments[idx] has changed, hence the meaning will be different. but I have not yet analyzed whether this works correctly