Awesome Deep Learning Theory

Tutorials and other resources

1. ICML 2018 Tutorial: Toward Theoretical Understanding of Deep Learning

Material: Homepage, Slide

Notes:

Going through the role of: convexity, overparameterization, role of depth, unsupervised/GAN, simpler networks (compression)
Present many important phenomenons and simple theories.

2. ICML Invited Talk: Max Welling - Intelligence per Kilowatthour

Materials: Youtube (not from ICML)

Various views, not only on intelligence per Kilowatthour, but also philosophical question about computational world.

3. ICML 2018 Invited Talk: Josh Tenenbaum - Building Machines that Learn and Think Like People

Materials: paper - 55p, ICML Video

His view on the importance of Probabilistic Programming Language (BayesFlow, ProbTorch, WebPPL...)
Symbolic is good for abstract reasoning.
The inuitive Physics engine in our mind (even new-born)
The hard problem of learning: learning program is hard (difficult loss function)
Big question: How to connect Symbolic representations and Neural Nets?

Papers at ICML 2018

1. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

Materials: pdf, code

Guided by Michael Jordan
Instancewise feature selection as a methodology for model interpretation
Learn a function to extract a subset of features that are most informative for each given example
Develop an efficient variational approximation to the mutual information

2. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory

Materials: pdf, code

Nice combination between theory and practice
PAC-Bayes Bound (1999) -> PAC-Bayes Bound for Meta Learning (2014)
Stochastic Neural Nets with Posteriors and Priors are factorized Gaussians

3. Data-Dependent Stability of Stochastic Gradient Descent,

Materials: pdf

Work of Christoph Lampert, who develped PAC-Bayes for Meta Learning.
Establish a data-dependent notion of algorithmic stability for SGD
Develop novel generalization bounds

4. Stability and Generalization of Learning Algorithms that Converge to Global Optima

Materials: pdf

A Stable Learning Algorithm: Change in data doesn't affect model too much
Stability -> Generalization
Establish novel generalization bounds for learning algorithms that converge to global minima.
Derive black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the empirical risk function

5. An Alternative View: When Does SGD Escape Local Minima?

Materials: pdf, supp

Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern NNets.
To understand this phenomenon, we take an alternative view that SGD is working on the convolved (thus smoothed) version of the loss function.
Even if the function f has many bad local minima or saddle points, as long as for every point x, the weighted average of the gradients of its neighborhoods is one point convex with respect to the desired solution x∗, SGD will get close to, and then stay around x∗ with constant probability

6. Geometry Score: A Method For Comparing Generative Adversarial Networks

Materials: pdf, code

Biggest challenges in the research of GANs is assessing the quality of generated samples and detecting various levels of mode collapse
Propose a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation.

7. Compressing Neural Networks using the Variational Information Bottelneck

Materials: pdf, code

NNets can be compressed to reduce memory and computational requirements, or to increase accuracy by facilitating the use of a larger base architecture.
Focus on pruning individual neurons, which can simultaneously trim model size, FLOPs, and run-time memory.
Utilize the information bottleneck principle instantiated via a tractable variational bound

8. A Spline Theory of Deep Learning

Materials: pdf, supp

Build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators.
A large class of DNs can be written as a composition of max-affine spline operators (MASOs).
Deep Nets are (learned) matched filter banks.

9. Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

Materials: pdf, supp

Important role of depth in deep learning has been emphasized
Argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected.
Sufficiently wide hidden layer is necessary to guarantee that the network can produce disconnected decision regions

10. Stronger Generalization Bounds for Deep Nets via a Compression Approach

Materials: pdf, supp

Recent works use PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive parameter counting
Use Noise stability -> Compress -> Better bounds
Data-dependent for generalization theory makes a stronger generalization bounds.

11. Invariance of Weight Distributions in Rectified MLPs

Materials: pdf, supp

An interesting approach to analyzing NNets that has received renewed attention is to examine the equivalent kernel of the neural network.
Fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons can be viewed as a mapping into a Hilbert space.
Derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for all rotationally-invariant weight distributions

12. Understanding Generalization and Optimization Performance of Deep CNNs

Materials: pdf, supp

13. On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups

Materials: pdf, supp

CNN: quivariance: translation
Can we generalize to other action gorups: rotation?
More general notion of convolution on groups
Fourier analysis on groups: spherical CNNs, Steerability and CNNs for manifolds
NNets for Graphs: Messasage passing Neural Net.

14. Bounding and Counting Linear Regions of Deep Neural Networks

Materials: pdf, supp

Study the number of linear regions, i.e. pieces, that a PWL function represented by a DNN can attain, both theoretically and empirically.
Tighter upper and lower bounds for the maximum number of linear regions on rectifier networks.
Indicate that a deep rectifier network can only have more linear regions than every shallow counterpart with same number of neurons if that number exceeds the dimension of the input.

15. A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

Materials: pdf, supp

Backpropagation-based visualizations have been proposed to interpret convolutional neural networks (CNNs), however a theory is missing to justify their behaviors.
Develop a theoretical explanation revealing that GBP and DeconvNet are essentially doing (partial) image recovery which is unrelated to the network decisions.

16. The Mechanics of n-Player Differentiable Games - Runner Up

Materials: pdf, supp

Context: Everything in DL (CNN, LSTM, Relu, Resnet...) depends on Gradient Descent to find (good) local minima. This fails in setting as of game for GAN model where there are multiple interacting losses.
Problem: This paper analyzes why GD fails in multiple interacting losses games like GAN and propose a way to modify/construct loss to make GD succeed.
Approach:
- Decompose the second-order dynamics into two components. The first is related to potential games, which reduces to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.
- Propose Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in general games.
Result: SGA is competitive for finding Local Nash equilibria in GANs.

Nikasa1889/awesome-deep-learning-theory

Awesome Deep Learning Theory

Tutorials and other resources

1. ICML 2018 Tutorial: Toward Theoretical Understanding of Deep Learning

2. ICML Invited Talk: Max Welling - Intelligence per Kilowatthour

3. ICML 2018 Invited Talk: Josh Tenenbaum - Building Machines that Learn and Think Like People

Papers at ICML 2018

1. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

2. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory

3. Data-Dependent Stability of Stochastic Gradient Descent,

4. Stability and Generalization of Learning Algorithms that Converge to Global Optima

5. An Alternative View: When Does SGD Escape Local Minima?

6. Geometry Score: A Method For Comparing Generative Adversarial Networks

7. Compressing Neural Networks using the Variational Information Bottelneck

8. A Spline Theory of Deep Learning

9. Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

10. Stronger Generalization Bounds for Deep Nets via a Compression Approach

11. Invariance of Weight Distributions in Rectified MLPs

12. Understanding Generalization and Optimization Performance of Deep CNNs

13. On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups

14. Bounding and Counting Linear Regions of Deep Neural Networks

15. A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

16. The Mechanics of n-Player Differentiable Games - Runner Up

Important Researchers