Separate optimisation from gradient calculation.

Question

Separate optimisation from gradient calculation.

Closed this issue 8 years ago · 0 comments

At the moment, we have the gradient calculations interwoven with the optimisation algorithm inside runBackwards. This is a bit terrible, as it means we can't give users a choice of algorithm.

It would be better for runBackwards to return just the gradients, and have training take a (hopefully minibatched) optimisation strategy.

Allowing Nesterov and SGD (with momentum) should be trivial, adagrad and friends might need a rethink in how we keep track of training updates.