This repository is concieved to provide aid in literature reiviews to Optimization researchers by offering an up-to-date list of literature and corresponding summaries.
If this repository has been useful to you in your research, please cite it using the cite this repository option available in Github. This repository would not have been possible without these open-source contributors. Thanks! 💖
- ✨ Awesome Optimizers 📉
- Survey Papers
- First-order Optimizers
- Second-order Optimizers
- Learned Optimizers
- Other Optimisation-Related Research
Symbol | Meaning | Count |
---|---|---|
📄 | Paper | 25 |
📤 | Summary | 7 |
💻 | Code | 0 |
-
An overview of gradient descent optimization algorithms Sebastian Ruder; 2016
-
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers Robin M. Schmidt, Frank Schneider, Philipp Hennig; 2020
-
Nesterov Accelerated Gradient momentum 📤 💻 Yuri Nesterov; Unknown
-
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity 📤 💻 Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvilli, Adam Bielski, Paolo Favaro; 2021
-
On the Momentum Term in Gradient Descent Learning Algorithms 📤 💻 Ning Qian; 1999
-
Demon: Improved Neural Network Training with Momentum Decay John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis ; 2021
-
Symbolic Discovery of Optimization Algorithms 📤 💻 Xiangning Chen, Chen Liang, Da Huang; 2023
-
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization 📤 💻 John Duchi, Elad Hazan, Yoram Singer; 2011
-
ADADELTA: An Adaptive Learning Rate Method 📤 💻 Matthew D. Zeiler; 2012
-
Adam: A Method for Stochastic Optimization 📤 💻 Diederik P. Kingma, Jimmy Ba; 2014
-
Decoupled Weight Decay Regularization 📤 💻 Ilya Loshchilov, Frank Hutter; 2017
-
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights 📤 💻 Byeongho Heo, Sanghyuk Chun, Seong Joon Oh, Dongyoon Han; 2020
-
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar Tatikonda, Nicha Dvornek, Xenophon Papademetris, James S. Duncan ; 2020
-
On the Variance of the Adaptive Learning Rate and Beyond 📤 💻 Liyuan Liu, Haoming Jiang, Pengcheng He; 2021
-
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods Juntang Zhuang, Yifan Ding, Tommy Tang, Nicha Dvornek, Sekhar Tatikonda, James S. Duncan ; 2021
- Shampoo: Preconditioned Stochastic Tensor Optimization 📤 💻 Vineet Gupta, Tomer Koren, Yoram Singer; 2018
-
Understanding and correcting pathologies in the training of learned optimizers 📤 💻 Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein; 2018
-
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves 📤 💻 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein; 2020
-
VeLO: Training Versatile Learned Optimizers by Scaling Up 📤 💻 Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein; 2022
-
Practical tradeoffs between memory, compute, and performance in learned optimizers 📤 💻 Luke Metz, C. Daniel Freeman, James Harrison, Niru Maheswaranathan, Jascha Sohl-Dickstein; 2022
- Gradient Centralization: A New Optimization Technique for Deep Neural Networks 📤 💻 Hongwei Yong, Jianqiang Huang, Xiansheng Hua, Lei Zhang; 2020
-
On Empirical Comparisons of Optimizers for Deep Learning 📤 Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl; 2019
-
Adam Can Converge Without Any Modification on Update Rules 📤 Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo; 2022
- Gradient Descent: The Ultimate Optimizer 📤 💻 Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, Erik Meijer; 2019