Feature Learning in Deep Learning Theory Reading Group

Introduction

Welcome to the GitHub repository of our Feature Learning in Deep Learning Theory Reading Group! This group is dedicated to the study, discussion, and understanding of feature learning concepts and techniques in the field of Deep Learning. Our objective is to bring together researchers, professionals, students, and anyone interested in feature learning, to learn from each other, discuss recent advancements and challenges, and contribute to the knowledge pool of Deep Learning Theory.

Participation

We warmly invite anyone interested to join us. To participate:

Follow this Repository: Keep up to date with the reading materials we will be discussing.
Discussion: Participate in discussions on the Issues tab. Each paper will have a dedicated issue where the discussion will take place.

Classification
Regression
Transformers

Classification

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning, ICLR 2023. (link)
Benign Overfitting in Two-layer Convolutional Neural Networks, NeurIPS 2022. (link) (video)
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective. ICML 2023 Workshop, Contribued Talk. (link)
Feature purification: How adversarial training performs robust deep learning, FOCS 2021. (link)
Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss, COLT 2020 (link)
Toward understanding the feature learning process of self-supervised contrastive learning, ICML 2021. (link)
Towards Understanding Mixture of Experts in Deep Learning, NeurIPS 2022. (link)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization, ICLR 2023 (link)
Towards Understanding Feature Learning in Out-of-Distribution Generalization, NeurIPS 2023 (link)
Benign Overfitting for Two-layer ReLU Networks, ICML 2023. (link)
Data Augmentation as Feature Manipulation, ICML 2022. (link)
Towards understanding how momentum improves generalization in deep learning, ICML 2022. (link)
The Benefits of Mixup for Feature Learning. (link)
Pruning Before Training May Improve Generalization, Provably. (link)
Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup, ICML 2023. (link)
How Does Semi-supervised Learning with Pseudo-Labels? A Case Study, ICLR 2023. (link)
Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels, NeurIPS 2021. (link)
Provable Guarantees for Neural Networks via Gradient Feature Learning, NeurIPS 2023. (link)
Robust Learning with Progressive Data Expansion Against Spurious Correlation, NeurIPS 2023. (link)
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP, (link)
Why Does Sharpness-Aware Minimization Generalize Better Than SGD? NeurIPS 2023, (link)
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data, COLT 2022, (link)
Random Feature Amplification: Feature Learning and Generalization in Neural Networks, JMLR 2023, (link)
Benign Overfitting in Adversarially Robust Linear Classification, UAI 2023, (link)
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates, ICLR 2024, (link)
Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory, ICLR 2024, (link)
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data, ICLR 2024, (link)
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data. (link)
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem, ICLR 2024, (link)
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond, NeurIPS 2023, (link)
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data, ICLR 2024, (link)
What Improves the Generalization of Graph Transformer? A Theoretical Dive into Self-attention and Positional Encoding, (link)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks, ICLR 2023, (link)
Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples, ICML 2024, link
Provable Benefits of Local Steps in Heterogeneous Federated Learning for Neural Networks: A Feature Learning Perspective, ICML 2024, link

Regression

Feature Learning in Infinite-Width Neural Networks, ICML 2021. (link)
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation, NeurIPS 2022. (link)
Self-consistent dynamical field theory of kernel evolution in wide neural networks, NeurIPS 2022. (link)
Feature Learning in L2-regularized DNNs: Attraction/Repulsion and Sparsity, NeurIPS 2022 (link)
Gradient-Based Feature Learning under Structured Data, NeurIPS 2023. (link)
The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks, COLT 2022 (link)
Neural Networks can Learn Representations with Gradient Descent, COLT 2022. (link)
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks. (link)
The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks, COLT 2022. (link)
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD, ICLR 2023. (link)
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics, COLT 2023 (link)
Learning Two-Layer Neural Networks, One (Giant) Step at a Time (link)
Gradient-Based Feature Learning under Structured Data, NeurIPS 2023. (link)
Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime, ICLR 2024, (link)
Three Mechanisms of Feature Learning in the Exact Solution of a Latent Variable Model, (link)
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes, NeurIPS 2024 (link)

Transformers

Vision Transformers provably learn spatial structure, NeurIPS 2022. (link)
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity, ICLR 2023. (link)
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer, NeurIPS 2023 (link)
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
On the Role of Attention in Prompt-tuning, ICML 2023 (link)
In-Context Convergence of Transformers, NeurIPS 2023 workshop (link)
Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis, (link)
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization, NeurIPS 2024, (link)
Benign or Not-Benign Overfitting in Token Selection of Attention Mechanism, (link)
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context, (link)

Contact

For any queries, please open an issue or feel free to reach out to us via email at weihuang.uts@gmail.com

Code of Conduct

We aim to maintain a respectful and inclusive environment for everyone, and we expect all participants to uphold this standard.

We look forward to your active participation and happy reading!

WeiHuang05/Awesome-Feature-Learning-in-Deep-Learning-Thoery