Awesome Second-Order Methods

A curated list of resources for second-order stochastic optimization methods in machine learning.

Books and Lecture Notes

Optimization Methods for Large-Scale Machine Learning by Léon Bottou, Frank E. Curtis, Jorge Nocedal, 2016.
Exact and inexact subsampled Newton methods for optimization by Raghu Bollapragada, Richard H Byrd, Jorge Nocedal, 2018.

AdaHessian: An Adaptive Second Order Optimizer for Machine Learning by Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney, 2020. Algorithm: AdaHessian
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training by Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma, 2023. Algorithm: Sophia

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks by Yi Ren and Donald Goldfarb, 2019. Algorithm: SWM-GN, SWM-NG
On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs by Matilde Gargiani et al., 2020. Algorithm: SGN
Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization by Quoc Tran-Dinh et al., 2020. Algorithm: SGN with SARAH estimators
Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates by Johannes J. Brust, 2021. Discusses using stochastic Jacobian estimates in nonlinear least squares for scalable machine learning. Algorithm: NLLS1, NLLSL
Improving Levenberg-Marquardt Algorithm for Neural Networks by Omead Pooladzandi and Yiming Zhou, 2022. Algorithm: LM
Rethinking Gauss-Newton for learning over-parameterized models by Michael Arbel et al., 2023.
Exact Gauss-Newton Optimization for Training Deep Neural Networks by Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon, 2024. Algorithm: EGN

Second-order optimization with lazy Hessians by Nikita Doikov, El Mahdi Chayti, Martin Jaggi, 2022.

Optax - mostly first-order accelerated methods
Somax - second-order stochastic solvers
JAXopt - deterministic second-order methods (e.g., Gauss-Newton, Levenberg Marquardt), stochastic first-order methods PolyakSGD, ArmijoSGD
KFAC-JAX - implementation of KFAC from the DeepMind team
AdaHessianJax - implementation of the AdaHessian optimizer by Nestor Demeure