Deep learning explained

A collection of papers that try to explain the mysteries of deep learning with theories and empirical evidences. And here is a curated resource of deep learning theory papers by Prof. Boris Hanin at Princeton.

Theory-oriented explanations
Empirical observations and explanations

Theory-oriented explanations

Information-theoretic

Towards a Unified Information-Theoretic Framework for Generalization, Nov. 9 2021. nips2021 Daniel Roy's group. non-vacuous generalization bound

Theory of training

SGD, loss landscape, learning dynamics, stochacity, sgd for feature learning, learning curriculum etc.

Don't Decay the Learning Rate, Increase the Batch Size, Nov. 2017. iclr2018.
Stochastic Training is Not Necessary for Generalization, Tom Goldstein's group, nips2021.
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect, Tuo Zhao's group.
Momentum Doesn't Change The Implicit Bias.
On the Implicit Biases of Architecture & Gradient Descent, 2021, Yisong Yue's group, implicit bias of gd
Parameter Prediction for Unseen Deep Architectures, Oct. 25 2021.
Gradient Starvation: A Learning Proclivity in Neural Networks, Oct. 26 2021. nips2021
What training reveals about neural network complexity, Oct. 29 2021.
A Loss Curvature Perspective on Training Instabilities of Deep Learning Models, iclr2022 submit
Permutation-Based SGD: Is Random Optimal?, iclr2022 submit
A General Analysis of Example-Selection for Stochastic Gradient Descent, iclr2022 submit
How many degrees of freedom do we need to train deep networks: a loss landscape perspective, Jul. 13 2021.
The Benefits of Implicit Regularization from SGD in Least Squares Problems, Aug. 10 2021 nips2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion, Dec. 2 2022.
Understanding Gradient Descent on Edge of Stability in Deep Learning, May 22 2022.
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example, Oct. 7 2022.
Neural Networks can Learn Representations with Gradient Descent, Jun. 30 2022. colt2022
Git Re-Basin: Merging Models modulo Permutation Symmetries, Sep. 11 2022. tweet1, tweet2, tweet3.
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima, Oct. 4 2022.
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent, Oct. 13 2022.
Grokking phase transitions in learning local rules with gradient descent, Oct. 26 2022.
High-dimensional Asympototics of Feature Learning: How One Gradient Step Improves the Representation, Jimmy Ba et al. arXiv May 3 2022.
Exact learning dynamics of deep linear networks with prior knowledge, nips2022. learning dynamics.
Handbook of Convergence Theorems for (Stochastic) Gradient Methods, Jan. 26 2023.

Neural Tangent Kernel

Learning sparse features can lead to overfitting in neural networks, Jun. 24 2022.
Limitations of the NTK for Understanding Generalization in Deep Learning, Jun. 20 2022.
The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks, Oct. 5 2022.

Understanding training tricks

Does Knowledge Distillation Really Work?, Jun. 10 2021. nips2021
Understanding Why Generalized Reweighting Does Not Improve Over ERM, Jan. 28 2022.

Implicit regularization

Limitation of characterizing implicit regularization by data-independent functions, Jan. 28 2022.
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data, Oct. 2022.

Theory of representation learning

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD, Sep. 29 2022. sgd
Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?, Jun. 13 2022.
Feature learning in neural networks and kernel machines that recursively learn features, Dec. 28 2022.
- From Mikhail Belkin's group.

Self-supervised learning

Exploring the Limits of Large Scale Pre-training, Google Research.

Contrastive learning

The Power of Contrast for Feature Learning: A Theoretical Analysis, Oct. 2021. James Zou's group.
Sharp Learning Bounds for Contrastive Unsupervised Representation Learning, Oct. 2021. RIKEN AIP.
Can contrastive learning avoid shortcut solutions?, Jun. 21 2021. MIT and Pittsburg univ.
Intriguing Properties of Contrastive Losses, Oct. 23 2021. Google Research.
Stochastic Contrastive Learning, Oct. 2021. interpretability
How Does Contrastive Pre-training Connect Disparate Domains?, nipst2021
Contrastive Learning Can Find An Optimal Basis for Approximately View-Invariant Functions, arXiv Oct. 4 2022.
Do More Negative Samples Necessarily Hurt In Contrastive Learning?, Jun. 22 2022. icml2022.
Understanding Deep Contrastive Learning via Coordinate-wise Optimization, nips2022.
Understanding Contrastive Learning Requires Incorporating Inductive Biases, Feb. 28 2022.
Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning, Dec. 15 2022.

Explaining representational power

Emergence of Invariance and Disentanglement in Deep Representations, jmlr2018
Grounding Representation Similarity with Statistical Testing, Nov. 3 2021. representation comparison
Revisiting Model Stitching to Compare Neural Representations, Jun. 14 2021. representation comparison
Comparing Text Representations: A Theory-Driven Approach, Sep. 2021. sentence embedding
Discovering and Explaining The Representation Bottleneck of DNNs, iclr2022 submit
A theory of representation learning in deep neural networks gives a deep generalisation of kernel methods, Apr. 23 2023. icml2023.

Neural collapse

Prevalence of Neural Collapse during the terminal phase of deep learning training, Aug. 21 2020.

Empirical observations and explanations

Double descent

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle, Mar. 24 2023.

Mechanistic interpretability of DL

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers, Oct. 18 2022.

Generalization metrics

Neural Tangent Kernel Eigenvalues Accurately Predict Generalization, UCB，nips2021 spotlight.
Predicting Unreliable Predictions by Shattering a Neural Network, 2021, Yoshua Bengio's group.
On Predicting Generalization using GANs, Nov. 28 2021.
Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks, Nov. 25 2021.

Flatness

On the Maximum Hessian Eigenvalue and Generalization, Jun. 22 2022.

Decision boundary

Can You Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective, Dec. 16 2021.

Data-centric understanding

Deep Learning Through the Lens of Example Difficulty, Google Research 2021.
Deep Learning on a Data Diet: Finding Important Examples Early in Training, Jul. 15 2021. nips2021

Spurious correlation

See here for the detailed discussion on spurious correlation.

Lottery ticket hypothesis

Can You Win Everything with A Lottery Ticket?, TMLR 2022.

Memorization

Network size and weights size for memorization with two-layers neural networks, Nov. 3 2020.
What Do Neural Networks Learn When Trained With Random Labels?, nips2020.
Neural Networks Learning and Memorization with (almost) no Over-Parameterization, nips2020.
On the geometry of generalization and memorization in deep neural networks, iclr2021.
The Curious Case of Benign Memorization, Oct. 25 2022.
- "only the very last layers are used for memorization, while preceding layers encode performant features which remain largely unaffected by the label noise"
Distinguishing rule and exemplar-based generalization in learning systems, icml2022.
- The experiment setting has been applied to study in-context ability of Transformers tweet.
Unintended memorisation of unique features in neural networks, May 20 2022.

Epsilon-Lee/deep-learning-explained