/awesome-nn-optimization

Awesome list for Neural Network Optimization methods.

Creative Commons Attribution 4.0 InternationalCC-BY-4.0

Content

Popular Optimization algorithms

Normalization Methods

  • BatchNorm [Link]
  • Weight Norm [Link]
  • Spectral Norm [Link]
  • Cosine Normalization [Link]
  • L2 Regularization versus Batch and Weight Normalization Link

On Convexity and Generalization of Neural Networks

  • Convex Neural Networks [Link]
  • Breaking the Curse of Dimensionality with Convex Neural Networks [Link]
  • UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION [Link]
  • Optimal Control Via Neural Networks: A Convex Approach. [Link]
  • Input Convex Neural Networks [Link]
  • A New Concept of Convex based Multiple Neural Networks Structure. [Link
  • SGD Converges to Global Minimum in Deep Learning via Star-convex Path [Link]
  • A Convergence Theory for Deep Learning via Over-Parameterization Link

Continuation Methods and Curriculum Learning

  • Curriculum Learning [Link]
  • SOLVING RUBIK’S CUBE WITH A ROBOT HAND Link
  • Noisy Activation Function [Link]
  • Mollifying Networks [Link]
  • Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks Link Talk
  • Automated Curriculum Learning for Neural Networks Link
  • On The Power of Curriculum Learning in Training Deep Networks Link
  • On-line Adaptative Curriculum Learning for GANs Link
  • Parameter Continuation with Secant Approximation for Deep Neural Networks and Step-up GAN Link
  • HashNet: Deep Learning to Hash by Continuation. [Link]
  • Learning Combinations of Activation Functions. [Link]
  • Learning and development in neural networks: The importance of starting small (1993) Link
  • Flexible shaping: How learning in small steps helps Link
  • Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning Link
  • RETHINKING CURRICULUM LEARNING WITH INCREMENTAL LABELS AND ADAPTIVE COMPENSATION Link
  • Parameter Continuation Methods for the Optimization of Deep Neural Networks Link
  • Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection [Link (https://www.aclweb.org/anthology/W18-6314.pdf)
  • Reinforcement Learning based Curriculum Optimization for Neural Machine Translation Link
  • EVOLUTIONARY POPULATION CURRICULUM FOR SCALING MULTI-AGENT REINFORCEMENT LEARNING Link
  • ENTROPY-SGD: BIASING GRADIENT DESCENT INTO WIDE VALLEYS Link
  • NEIGHBOURHOOD DISTILLATION: ON THE BENEFITS OF NON END-TO-END DISTILLATION Link

On Loss Surfaces and Generalization of Deep Neural Networks

  • Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
  • QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS[Link]
  • The Loss Surfaces of Multilayer Networks [Link]
  • Visualizing the Loss Landscape of Neural Nets [Link]
  • The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link]
  • How regularization affects the critical points in linear networks.[Link]
  • Local minima in training of neural networks [Link]
  • Necessary and Sufficient Geometries for Gradient Methods Link
  • Fine-grained Optimization of Deep Neural Networks Link
  • SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS Link

Dynamics, Bifurcations and RNNs difficulty to train

  • Deep Equilibrium Models Link
  • Bifurcations of Recurrent Neural Networks in Gradient Descent Learning [Link]
  • On the difficulty of training recurrent neural networks [Link]
  • Understanding and Controlling Memory in Recurrent Neural Networks [Link]
  • Dynamics and Bifurcation of Neural Networks [Link]
  • Context Aware Machine Learning [Link]
  • The trade-off between long-term memory and smoothness for recurrent networks [Link]
  • Dynamical complexity and computation in recurrent neural networks beyond their fxed point [Link]
  • Bifurcations in discrete-time neural networks : controlling complex network behaviour with inputs [Links]
  • Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors [Link]
  • Bifurcation analysis of a neural network model Link
  • A Differentiable Physics Engine for Deep Learning in Robotics Link
  • Deep learning for universal linear embeddings of nonlinear dynamics Link
  • Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations Link
  • Analysis of gradient descent learning algorithms for multilayer feedforward neural networks Link
  • A dynamical model for the analysis and acceleration of learning in feedforward networks Link
  • A bio-inspired bistable recurrent cell allows for long-lasting memory Link

Poor Local Minima?

  • Adding One Neuron Can Eliminate All Bad Local Minima Link
  • Deep Learning without Poor Local Minima Link
  • Elimination of All Bad Local Minima in Deep Learning Link
  • How to escape saddle points efficiently. Link
  • Depth with Nonlinearity Creates No Bad Local Minima in ResNets Link

Initialization of Neural Network

  • Deep learning course notes Link
  • On the importance of initialization and momentum in deep learning Link
  • The Break-Even Point on Optimization Trajectories of Deep Neural Networks Link
  • THE EARLY PHASE OF NEURAL NETWORK TRAINING Link
  • One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers Link
  • PCA-Initialized Deep Neural Networks Applied To Document Image Analysis Link

Multi-Task Learning with curricula

  • Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. Link
  • Learning a Multitask Curriculum for Neural Machine Translation. Link
  • Self-paced Curriculum Learning. Link
  • Curriculum Learning of Multiple Tasks. Link

Constrained Optimization for Deep Learning

  • A Primal-Dual Formulation for Deep Learning with Constraints Link

Reinforcement Learning and Curriculum

  • Object-Oriented Curriculum Generation for Reinforcement Learning Link
  • Teacher-Student Curriculum Learning Link

Tutorials, Surveys and Blogs

  • https://www.offconvex.org/
  • An overview of gradient descent optimization algorithms [Link]
  • Why Momentum really works?[Blog]
  • Optimization [Book]
  • Optimization for deep learning: theory and algorithms Link
  • Generalization Error in Deep Learning Link
  • Automatic Differentiation in Machine Learning: a Survey Link
  • Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey Link
  • Automatic Curriculum Learning For Deep RL: A Short Survey Link

Contributing

If you've found any informative resources that you think belong here, be sure to submit a pull request or create an issue!