reimplementing adaptive variants of gradient descent like RMSProp, AdaGrad, Adam
this code is a refactored version of a setup which served for experiments part of the second assignment in the course
EE-556 Mathematics of Data: From Theory To Computation by Prof. Cevher at EPFL