- SGD [Book]
- Momentum [Book]
- RMSProp [Book]
- AdaGrad [Link]
- ADAM [Link]
- AdaBound [Link] [Github]
- ADAMAX [Link]
- NADAM [Link]
- BatchNorm [Link]
- Weight Norm [Link]
- Spectral Norm [Link]
- Cosine Normalization [Link]
- L2 Regularization versus Batch and Weight Normalization Link
- Convex Neural Networks [Link]
- Breaking the Curse of Dimensionality with Convex Neural Networks [Link]
- UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION [Link]
- Optimal Control Via Neural Networks: A Convex Approach. [Link]
- Input Convex Neural Networks [Link]
- A New Concept of Convex based Multiple Neural Networks Structure. [Link
- SGD Converges to Global Minimum in Deep Learning via Star-convex Path [Link]
- A Convergence Theory for Deep Learning via Over-Parameterization Link
- Curriculum Learning [Link]
- SOLVING RUBIK’S CUBE WITH A ROBOT HAND Link
- Noisy Activation Function [Link]
- Mollifying Networks [Link]
- Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks Link Talk
- Automated Curriculum Learning for Neural Networks Link
- On The Power of Curriculum Learning in Training Deep Networks Link
- On-line Adaptative Curriculum Learning for GANs Link
- Parameter Continuation with Secant Approximation for Deep Neural Networks and Step-up GAN Link
- HashNet: Deep Learning to Hash by Continuation. [Link]
- Learning Combinations of Activation Functions. [Link]
- Learning and development in neural networks: The importance of starting small (1993) Link
- Flexible shaping: How learning in small steps helps Link
- Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning Link
- RETHINKING CURRICULUM LEARNING WITH INCREMENTAL LABELS AND ADAPTIVE COMPENSATION Link
- Parameter Continuation Methods for the Optimization of Deep Neural Networks Link
- Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection [Link (https://www.aclweb.org/anthology/W18-6314.pdf)
- Reinforcement Learning based Curriculum Optimization for Neural Machine Translation Link
- EVOLUTIONARY POPULATION CURRICULUM FOR SCALING MULTI-AGENT REINFORCEMENT LEARNING Link
- ENTROPY-SGD: BIASING GRADIENT DESCENT INTO WIDE VALLEYS Link
- NEIGHBOURHOOD DISTILLATION: ON THE BENEFITS OF NON END-TO-END DISTILLATION Link
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
- QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS[Link]
- The Loss Surfaces of Multilayer Networks [Link]
- Visualizing the Loss Landscape of Neural Nets [Link]
- The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link]
- How regularization affects the critical points in linear networks.[Link]
- Local minima in training of neural networks [Link]
- Necessary and Sufficient Geometries for Gradient Methods Link
- Fine-grained Optimization of Deep Neural Networks Link
- SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS Link
- Deep Equilibrium Models Link
- Bifurcations of Recurrent Neural Networks in Gradient Descent Learning [Link]
- On the difficulty of training recurrent neural networks [Link]
- Understanding and Controlling Memory in Recurrent Neural Networks [Link]
- Dynamics and Bifurcation of Neural Networks [Link]
- Context Aware Machine Learning [Link]
- The trade-off between long-term memory and smoothness for recurrent networks [Link]
- Dynamical complexity and computation in recurrent neural networks beyond their fxed point [Link]
- Bifurcations in discrete-time neural networks : controlling complex network behaviour with inputs [Links]
- Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors [Link]
- Bifurcation analysis of a neural network model Link
- A Differentiable Physics Engine for Deep Learning in Robotics Link
- Deep learning for universal linear embeddings of nonlinear dynamics Link
- Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations Link
- Analysis of gradient descent learning algorithms for multilayer feedforward neural networks Link
- A dynamical model for the analysis and acceleration of learning in feedforward networks Link
- A bio-inspired bistable recurrent cell allows for long-lasting memory Link
- Adding One Neuron Can Eliminate All Bad Local Minima Link
- Deep Learning without Poor Local Minima Link
- Elimination of All Bad Local Minima in Deep Learning Link
- How to escape saddle points efficiently. Link
- Depth with Nonlinearity Creates No Bad Local Minima in ResNets Link
- Deep learning course notes Link
- On the importance of initialization and momentum in deep learning Link
- The Break-Even Point on Optimization Trajectories of Deep Neural Networks Link
- THE EARLY PHASE OF NEURAL NETWORK TRAINING Link
- One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers Link
- PCA-Initialized Deep Neural Networks Applied To Document Image Analysis Link
- Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. Link
- Learning a Multitask Curriculum for Neural Machine Translation. Link
- Self-paced Curriculum Learning. Link
- Curriculum Learning of Multiple Tasks. Link
- A Primal-Dual Formulation for Deep Learning with Constraints Link
- Object-Oriented Curriculum Generation for Reinforcement Learning Link
- Teacher-Student Curriculum Learning Link
- https://www.offconvex.org/
- An overview of gradient descent optimization algorithms [Link]
- Why Momentum really works?[Blog]
- Optimization [Book]
- Optimization for deep learning: theory and algorithms Link
- Generalization Error in Deep Learning Link
- Automatic Differentiation in Machine Learning: a Survey Link
- Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey Link
- Automatic Curriculum Learning For Deep RL: A Short Survey Link
If you've found any informative resources that you think belong here, be sure to submit a pull request or create an issue!