A simple python implementation of adagrad, created with the help of
http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf
http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent/
The code is purposefully kept simple so that it is easy to understand. For efficiency, it may be better to convert some of the data structures that are currently lists to numpy arrays.