Stochastic gradient descent using second-order information. Ye, C., Yang, Y., Fermuller, C., & Aloimonos, Y. (2017). On the Importance of Consistency in Training Deep Neural Networks. arXiv pre arXiv:1708.00631.

# Arguments
    lr: float >= 0. Learning rate.
    momentum: float >= 0. Parameter updates momentum.
    decay: float >= 0. Learning rate decay over each update.
    nesterov: boolean. Whether to apply Nesterov momentum.