awesome-nn-optimization: A repository from Jia-HongHenryLee

Content

Curriculum Learning [Link]
SOLVING RUBIK’S CUBE WITH A ROBOT HAND Link
Noisy Activation Function [Link]
Mollifying Networks [Link]
Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks Link Talk
Automated Curriculum Learning for Neural Networks Link
On The Power of Curriculum Learning in Training Deep Networks Link
On-line Adaptative Curriculum Learning for GANs Link
Parameter Continuation with Secant Approximation for Deep Neural Networks and Step-up GAN Link
HashNet: Deep Learning to Hash by Continuation. [Link]
Learning Combinations of Activation Functions. [Link]
Learning and development in neural networks: The importance of starting small (1993) Link
Flexible shaping: How learning in small steps helps Link
Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning Link
RETHINKING CURRICULUM LEARNING WITH INCREMENTAL LABELS AND ADAPTIVE COMPENSATION Link
Parameter Continuation Methods for the Optimization of Deep Neural Networks Link
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection [Link (https://www.aclweb.org/anthology/W18-6314.pdf)
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation Link
EVOLUTIONARY POPULATION CURRICULUM FOR SCALING MULTI-AGENT REINFORCEMENT LEARNING Link
ENTROPY-SGD: BIASING GRADIENT DESCENT INTO WIDE VALLEYS Link
NEIGHBOURHOOD DISTILLATION: ON THE BENEFITS OF NON END-TO-END DISTILLATION Link

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS[Link]
The Loss Surfaces of Multilayer Networks [Link]
Visualizing the Loss Landscape of Neural Nets [Link]
The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link]
How regularization affects the critical points in linear networks.[Link]
Local minima in training of neural networks [Link]
Necessary and Sufficient Geometries for Gradient Methods Link
Fine-grained Optimization of Deep Neural Networks Link
SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS Link

Deep Equilibrium Models Link
Bifurcations of Recurrent Neural Networks in Gradient Descent Learning [Link]
On the difficulty of training recurrent neural networks [Link]
Understanding and Controlling Memory in Recurrent Neural Networks [Link]
Dynamics and Bifurcation of Neural Networks [Link]
Context Aware Machine Learning [Link]
The trade-off between long-term memory and smoothness for recurrent networks [Link]
Dynamical complexity and computation in recurrent neural networks beyond their fxed point [Link]
Bifurcations in discrete-time neural networks : controlling complex network behaviour with inputs [Links]
Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors [Link]
Bifurcation analysis of a neural network model Link
A Differentiable Physics Engine for Deep Learning in Robotics Link
Deep learning for universal linear embeddings of nonlinear dynamics Link
Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations Link
Analysis of gradient descent learning algorithms for multilayer feedforward neural networks Link
A dynamical model for the analysis and acceleration of learning in feedforward networks Link
A bio-inspired bistable recurrent cell allows for long-lasting memory Link

Deep learning course notes Link
On the importance of initialization and momentum in deep learning Link
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Link
THE EARLY PHASE OF NEURAL NETWORK TRAINING Link
One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers Link
PCA-Initialized Deep Neural Networks Applied To Document Image Analysis Link

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. Link
Learning a Multitask Curriculum for Neural Machine Translation. Link
Self-paced Curriculum Learning. Link
Curriculum Learning of Multiple Tasks. Link

https://www.offconvex.org/
An overview of gradient descent optimization algorithms [Link]
Why Momentum really works?[Blog]
Optimization [Book]
Optimization for deep learning: theory and algorithms Link
Generalization Error in Deep Learning Link
Automatic Differentiation in Machine Learning: a Survey Link
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey Link
Automatic Curriculum Learning For Deep RL: A Short Survey Link

If you've found any informative resources that you think belong here, be sure to submit a pull request or create an issue!