Gradient Descent Finds Global Minima of Deep Neural Networks https://arxiv.org/pdf/1811.03804.pdf
Mixing of Stochastic Accelerated Gradient Descent https://export.arxiv.org/pdf/1910.14616
vqSGD: Vector �antized Stochastic Gradient Descent https://arxiv.org/pdf/1911.07971v1.pdf