A curated list of resources for second-order stochastic optimization methods in machine learning.
- Numerical Optimization by Jorge Nocedal and Stephen J. Wright, 2006 💵
- Introduction to Optimization and Data Fitting by H. B. Nielsen and K. Madsen, 2010
- Optimization for Machine Learning by Elad Hazan, 2019
- Topics in Machine Learning: Neural Net Training Dynamics (Winter 2022) by Roger Grosse, University of Toronto, 2022
-
Optimization Methods for Large-Scale Machine Learning by Léon Bottou, Frank E. Curtis, Jorge Nocedal, 2016.
-
Exact and inexact subsampled Newton methods for optimization by Raghu Bollapragada, Richard H Byrd, Jorge Nocedal, 2018.
-
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks by Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou, 2017.
-
The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size by Vardan Papyan, 2018.
-
PyHessian: Neural Networks Through the Lens of the Hessian by Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney, 2019.
-
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization by Adepu Ravi Sankar, Yash Khasbage, Rahul Vigneswaran, Vineeth N Balasubramanian, 2020.
-
AdaHessian: An Adaptive Second Order Optimizer for Machine Learning by Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney, 2020. Algorithm: AdaHessian
-
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training by Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma, 2023. Algorithm: Sophia
-
Learning Recurrent Neural Networks with Hessian-Free Optimization by James Martens, Ilya Sutskever, 2011.
-
Training Neural Networks with Stochastic Hessian-Free Optimization by Ryan Kiros, 2013. Algorithm: SHF
-
A Stochastic Quasi-Newton Method for Large-Scale Optimization by R.H. Byrd, S.L. Hansen, J. Nocedal, Y. Singer, 2014.
-
A Multi-Batch L-BFGS Method for Machine Learning by Albert S. Berahas, Jorge Nocedal, Martin Takáč, 2016.
-
Stochastic Quasi-Newton with Line-Search Regularization by Adrian Wills, Thomas Schön, 2019. Algorithm: SQN
-
Practical Quasi-Newton Methods for Training Deep Neural Networks by Donald Goldfarb, Yi Ren, Achraf Bahamou, 2020.
-
Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks by Yi Ren and Donald Goldfarb, 2019. Algorithm: SWM-GN, SWM-NG
-
On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs by Matilde Gargiani et al., 2020. Algorithm: SGN
-
Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization by Quoc Tran-Dinh et al., 2020. Algorithm: SGN with SARAH estimators
-
Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates by Johannes J. Brust, 2021. Discusses using stochastic Jacobian estimates in nonlinear least squares for scalable machine learning. Algorithm: NLLS1, NLLSL
-
Improving Levenberg-Marquardt Algorithm for Neural Networks by Omead Pooladzandi and Yiming Zhou, 2022. Algorithm: LM
-
Rethinking Gauss-Newton for learning over-parameterized models by Michael Arbel et al., 2023.
-
Exact Gauss-Newton Optimization for Training Deep Neural Networks by Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon, 2024. Algorithm: EGN
- Optimizing Neural Networks with Kronecker-factored Approximate Curvature by James Martens and Roger Grosse, 2015. Algorithm: K-FAC
- Second-order optimization with lazy Hessians by Nikita Doikov, El Mahdi Chayti, Martin Jaggi, 2022.
-
Optax - mostly first-order accelerated methods
-
Somax - second-order stochastic solvers
-
JAXopt - deterministic second-order methods (e.g., Gauss-Newton, Levenberg Marquardt), stochastic first-order methods PolyakSGD, ArmijoSGD
-
KFAC-JAX - implementation of KFAC from the DeepMind team
-
AdaHessianJax - implementation of the AdaHessian optimizer by Nestor Demeure