Separate optimisation from gradient calculation.
Closed this issue · 0 comments
HuwCampbell commented
At the moment, we have the gradient calculations interwoven with the optimisation algorithm inside runBackwards
. This is a bit terrible, as it means we can't give users a choice of algorithm.
It would be better for runBackwards
to return just the gradients, and have training take a (hopefully minibatched) optimisation strategy.
Allowing Nesterov and SGD (with momentum) should be trivial, adagrad and friends might need a rethink in how we keep track of training updates.