/Awesome-Feature-Learning-in-Deep-Learning-Thoery

Welcome to the Awesome Feature Learning in Deep Learning Thoery Reading Group! This repository serves as a collaborative platform for scholars, enthusiasts, and anyone interested in delving into the fascinating world of feature learning within deep learning theory.

Feature Learning in Deep Learning Theory Reading Group

Introduction

Welcome to the GitHub repository of our Feature Learning in Deep Learning Theory Reading Group! This group is dedicated to the study, discussion, and understanding of feature learning concepts and techniques in the field of Deep Learning.

Objective

Our objective is to bring together researchers, professionals, students, and anyone interested in feature learning, to learn from each other, discuss recent advancements and challenges, and contribute to the knowledge pool of Deep Learning Theory.

Participation

We warmly invite anyone interested to join us. To participate:

  1. Follow this Repository: Keep up to date with the reading materials we will be discussing.
  2. Discussion: Participate in discussions on the Issues tab. Each paper will have a dedicated issue where the discussion will take place.

Reading List

The reading list will be updated on a weekly/bi-weekly basis with the papers/articles we plan to discuss. You can find the reading list as follows.

Classification

  • Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning, ICLR 2023. (link)

    Zeyuan Allen-Zhu, Yuanzhi Li

  • Benign Overfitting in Two-layer Convolutional Neural Networks, NeurIPS 2022. (link) (video)

    Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu

  • Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective. ICML 2023 Workshop, Contribued Talk. (link)

    Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki.

  • Feature purification: How adversarial training performs robust deep learning, FOCS 2021. (link)

    Zeyuan Allen-Zhu, Yuanzhi Li

  • Toward understanding the feature learning process of self-supervised contrastive learning, ICML 2021. (link)

    Zixin Wen, Yuanzhi Li

  • Towards Understanding Mixture of Experts in Deep Learning, NeurIPS 2022. (link)

    Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li

  • Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization, ICLR 2023 (link)

    Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

  • Towards Understanding Feature Learning in Out-of-Distribution Generalization, NeurIPS 2023 (link)

    Wei Huang*, Yongqiang Chen*, Kaiwen Zhou*, Yatao Bian, Bo Han, James Cheng

  • Benign Overfitting for Two-layer ReLU Networks, ICML 2023. (link)

    Yiwen Kou* and Zixiang Chen* and Yuanzhou Chen and Quanquan Gu

  • Data Augmentation as Feature Manipulation, ICML 2022. (link)

    Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

  • Towards understanding how momentum improves generalization in deep learning, ICML 2022. (link)

    Samy Jelassi, Yuanzhi Li

  • The Benefits of Mixup for Feature Learning. (link)

    Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

  • Pruning Before Training May Improve Generalization, Provably. (link)

    Hongru Yang, Yingbin Liang, Xiaojie Guo, Lingfei Wu, Zhangyang Wang

  • Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup, ICML 2023. (link)

    Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

  • How Does Semi-supervised Learning with Pseudo-Labels? A Case Study, ICLR 2023. (link)

    Yiwen Kou, Zixiang Chen, Yuan Cao, Quanquan Gu

  • Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels, NeurIPS 2021. (link)

    Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh

  • Provable Guarantees for Neural Networks via Gradient Feature Learning, NeurIPS 2023. (link)

    Zhenmei Shi*, Junyi Wei*, Yingyu Liang

  • Robust Learning with Progressive Data Expansion Against Spurious Correlation, NeurIPS 2023. (link)

    Yihe Deng, Yu Yang, Baharan Mirzasoleiman, Quanquan Gu

  • Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP, (link)

    Zixiang Chen* and Yihe Deng* and Yuanzhi Li and Quanquan Gu

  • Why Does Sharpness-Aware Minimization Generalize Better Than SGD? NeurIPS 2023, (link)

    Zixiang Chen · Junkai Zhang · Yiwen Kou · Xiangning Chen · Cho-Jui Hsieh · Quanquan Gu

  • Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data, COLT 2022, (link)

    Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

  • Random Feature Amplification: Feature Learning and Generalization in Neural Networks, JMLR 2023, (link)

    Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

  • Benign Overfitting in Adversarially Robust Linear Classification, UAI 2023, (link)

    Jinghui Chen, Yuan Cao, Quanquan Gu

  • Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates, ICLR 2024, (link)

    Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou

  • Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory, ICLR 2024, (link)

    Wei Huang, Ye Shi, Zhongyi Cai, Taiji Suzuki

  • Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data, ICLR 2024, (link)

    Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu

  • Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data. (link)

    Xuran Meng, Difan Zou, Yuan Cao

  • SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem, ICLR 2024, (link)

    Margalit Glasgow

  • Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond, NeurIPS 2023, (link)

    Taiji Suzuki, Denny Wu, Kazusato Oko, Atsushi Nitanda

  • Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data, ICLR 2024, (link)

    Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, Denny Wu

  • What Improves the Generalization of Graph Transformer? A Theoretical Dive into Self-attention and Positional Encoding, (link)

    Hongkang Li, Meng Wang, Tengfei Ma, Sijia Liu, ZAIXI ZHANG, Pin-Yu Chen

  • Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks, ICLR 2023, (link)

    Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Regression

  • Feature Learning in Infinite-Width Neural Networks, ICML 2021. (link)

    Greg Yang, Edward J. Hu

  • High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation, NeurIPS 2022. (link)

    Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

  • Gradient-Based Feature Learning under Structured Data, NeurIPS 2023. (link)

    Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A Erdogdu

  • Neural Networks can Learn Representations with Gradient Descent, COLT 2022. (link)

    Alex Damian, Jason D. Lee, Mahdi Soltanolkotabi

  • Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks. (link)

    Eshaan Nichani, Alex Damian, Jason D. Lee

  • The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks, COLT 2022. (link)

    Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

  • Neural Networks Efficiently Learn Low-Dimensional Representations with SGD, ICLR 2023. (link)

    Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis Mitliagkas, Murat A. Erdogdu

  • Learning Two-Layer Neural Networks, One (Giant) Step at a Time (link)

    Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

  • Gradient-Based Feature Learning under Structured Data, NeurIPS 2023. (link)

    Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu

  • Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime, ICLR 2024, (link)

    Keita Suzuki, Taiji Suzuki

Transformers

  • Vision Transformers provably learn spatial structure, NeurIPS 2022. (link)

    Samy Jelassi, Michael E. Sander, Yuanzhi Li

  • A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity, ICLR 2023. (link)

    Hongkang Li, Meng Wang, Sijia Liu, Pin-yu Chen

  • Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer, NeurIPS 2023 (link)

    Yuandong Tian, Yiping Wang, Beidi Chen, Simon Du

  • JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

    Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Du

  • On the Role of Attention in Prompt-tuning, ICML 2023 (link)

    Samet Oymak*, Ankit Singh Rawat*, Mahdi Soltanolkotabi*, Christos Thrampoulidis*

  • In-Context Convergence of Transformers, NeurIPS 2023 workshop (link)

    Yu Huang, Yuan Cheng, Yingbin Liang

  • Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis, (link)

    Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Contact

For any queries, please open an issue or feel free to reach out to us via email at weihuang.uts@gmail.com

Code of Conduct

We aim to maintain a respectful and inclusive environment for everyone, and we expect all participants to uphold this standard.

We look forward to your active participation and happy reading!