Reading List for Machine Learning

Reading list for ML (List from AI602. Advanced Deep Learning (KAIST) by professor Sung Ju Hwang (sjhwang82@kaist.ac.kr)

Reading List

Bayesian Deep Learning

[Kingma and Welling 14] Auto-Encoding Variational Bayes, ICLR 2014.
[Kingma et al. 15] Variational Dropout and the Local Reparameterization Trick, NIPS 2015.
[Blundell et al. 15] Weight Uncertainty in Neural Networks, ICML 2015.
[Gal and Ghahramani 16] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2016.
[Liu et al. 16] Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm, NIPS 2016.
[Mandt et al. 17] Stochastic Gradient Descent as Approximate Bayesian Inference, JMLR 2017.
[Kendal and Gal 17] What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, ICML 2017.
[Gal et al. 17] Concrete Dropout, NIPS 2017.
[Gal et al. 17] Deep Bayesian Active Learning with Image Data, ICML 2017.
[Teye et al. 18] Bayesian Uncertainty Estimation for Batch Normalized Deep Networks, ICML 2018.
[Garnelo et al. 18] Conditional Neural Process, ICML 2018.
[Kim et al. 19] Attentive Neural Processes, ICLR 2019.
[Sun et al. 19] Functional Variational Bayesian Neural Networks, ICLR 2019.

[Louizos et al. 19] The Functional Neural Process, NeurIPS 2019.
[Amersfoort et al. 20] Uncertainty Estimation Using a Single Deep Deterministic Neural Network, ICML 2020.
[Dusenberry et al. 20] Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors, ICML 2020.
[Wenzel et al. 20] How Good is the Bayes Posterior in Deep Neural Networks Really?, ICML 2020.
[Lee et al. 20] Bootstrapping Neural Processes, arXiv preprint 2020.

Deep Generative Models

VAEs, Autoregressive and Flow-Based Generative Models

[Rezende and Mohamed 15] Variational Inference with Normalizing Flows, ICML 2015.
[Germain et al. 15] MADE: Masked Autoencoder for Distribution Estimation, ICML 2015.
[Kingma et al. 16] Improved Variational Inference with Inverse Autoregressive Flow, NIPS 2016.
[Oord et al. 16] Pixel Recurrent Neural Networks, ICML 2016.
[Dinh et al. 17] Density Estimation Using Real NVP, ICLR 2017.
[Papamakarios et al. 17 Masked Autoregressive Flow for Density Estimation, NIPS 2017.
[Huang et al.18] Neural Autoregressive Flows, ICML 2018.
[Kingma and Dhariwal 18] Glow: Generative Flow with Invertible 1x1 Convolutions, NeurIPS 2018.
[Ho et al. 19] Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design, ICML 2019.

[Chen et al. 19] Residual Flows for Invertible Generative Modeling, NeurIPS 2019.
[Tran et al. 19] Discrete Flows: Invertible Generative Models of Discrete Data, NeurIPS 2019.
[Ping et al. 20] WaveFlow: A Compact Flow-based Model for Raw Audio, ICML 2020.
[Vahdat and Kautz 20] NVAE: A Deep Hierarchical Variational Autoencoder, arXiv preprint, 2020.

Generative Adversarial Networks

[Goodfellow et al. 14] Generative Adversarial Nets, NIPS 2014.
[Radford et al. 15] Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
[Chen et al. 16] InfoGAN: Interpreting Representation Learning by Information Maximizing Generative Adversarial Nets, NIPS 2016.
[Arjovsky et al. 17] Wasserstein Generative Adversarial Networks, ICML 2017.
[Zhu et al. 17] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
[Zhang et al. 17] Adversarial Feature Matching for Text Generation, ICML 2017.
[Karras et al. 18] Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR 2018.
[Brock et al. 19] Large Scale GAN Training for High-Fidelity Natural Image Synthesis, ICLR 2019.
[Karras et al. 19] A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019.
[Xu et al. 19] Modeling Tabular Data using Conditional GAN, NeurIPS 2019.

[Karras et al. 20] Analyzing and Improving the Image Quality of StyleGAN, CVPR 2020.
[Zhao et al. 20] Feature Quantization Improves GAN Training, ICML 2020.
[Sinha et al. 20] Small-GAN: Speeding up GAN Training using Core-Sets, ICML 2020.

Deep Reinforcement Learning

[Mnih et al. 13] Playing Atari with Deep Reinforcement Learning, NIPS Deep Learning Workshop 2013.
[Silver et al. 14] Deterministic Policy Gradient Algorithms, ICML 2014.
[Schulman et al. 15] Trust Region Policy Optimization, ICML 2015.
[Lillicrap et al. 16] Continuous Control with Deep Reinforcement Learning, ICLR 2016.
[Schaul et al. 16] Prioritized Experience Replay, ICLR 2016.
[Wang et al. 16] Dueling Network Architectures for Deep Reinforcement Learning, ICML 2016.
[Mnih et al. 16] Asynchronous Methods for Deep Reinforcement Learning, ICML 2016.
[Schulman et al. 17] Proximal Policy Optimization Algorithms, arXiv preprint, 2017.
[Nachum et al. 18] Data-Efficient Hierarchical Reinforcement Learning, NeurIPS 2018.
[Ha et al. 18] Recurrent World Models Facilitate Policy Evolution, NeurIPS 2018.
[Burda et al. 19] Large-Scale Study of Curiosity-Driven Learning, ICLR 2019.
[Vinyals et al. 19] Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, 2019.
[Bellemare et al. 19] A Geometric Perspective on Optimal Representations for Reinforcement Learning, NeurIPS 2019.
[Janner et al. 19] When to Trust Your Model: Model-Based Policy Optimization, NeurIPS 2019.
[Fellows et al. 19] VIREL: A Variational Inference Framework for Reinforcement Learning, NeurIPS 2019.

[Kaiser et al. 20] Model Based Reinforcement Learning for Atari, ICLR 2020.
[Agarwal et al. 20] An Optimistic Perspective on Offline Reinforcement Learning, ICML 2020.
[Fedus et al. 20] Revisiting Fundamentals of Experience Replay, ICML 2020.
[Lee et al. 20] Batch Reinforcement Learning with Hyperparameter Gradients, ICML 2020.
[Raileanu et al. 20] Automatic Data Augmentation for Generalization in Deep Reinforcement Learning, arXiv preprint, 2020.

Memory and Computation-Efficient Deep Learning

[Han et al. 15] Learning both Weights and Connections for Efficient Neural Networks, NIPS 2015.
[Wen et al. 16] Learning Structured Sparsity in Deep Neural Networks, NIPS 2016
[Han et al. 16] Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR 2016
[Molchanov et al. 17] Variational Dropout Sparsifies Deep Neural Networks, ICML 2017
[Luizos et al. 17] Bayesian Compression for Deep Learning, NIPS 2017.
[Luizos et al. 18] Learning Sparse Neural Networks Through L0 Regularization, ICLR 2018.
[Howard et al. 18] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, CVPR 2018.
[Frankle and Carbin 19] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, ICLR 2019.
[Lee et al. 19] SNIP: Single-Shot Network Pruning Based On Connection Sensitivity, ICLR 2019.
[Liu et al. 19] Rethinking the Value of Network Pruning, ICLR 2019.
[Jung et al. 19] Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss, CVPR 2019.
[Morcos et al. 19] One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers, NeurIPS 2019.

[Renda et al. 20] Comparing Rewinding and Fine-tuning in Neural Network Pruning, ICLR 2020.
[Ye et al. 20] Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection, ICML 2020.
[Frankle et al. 20] Linear Mode Connectivity and the Lottery Ticket Hypothesis, ICML 2020.
[Li et al. 20] Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers, ICML 2020.
[Nagel et al. 20] Up or Down? Adaptive Rounding for Post-Training Quantization, ICML 2020.
[Meng et al. 20] Training Binary Neural Networks using the Bayesian Learning Rule, ICML 2020.

Meta Learning

[Santoro et al. 16] Meta-Learning with Memory-Augmented Neural Networks, ICML 2016
[Vinyals et al. 16] Matching Networks for One Shot Learning, NIPS 2016
[Edwards and Storkey 17] Towards a Neural Statistician, ICLR 2017
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
[Snell et al. 17] Prototypical Networks for Few-shot Learning, NIPS 2017.
[Nichol et al. 18] On First-Order Meta-learning Algorithms, arXiv preprint, 2018.
[Lee and Choi 18] Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace, ICML 2018.
[Liu et al. 19] Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning, ICLR 2019.
[Gordon et al. 19] Meta-Learning Probabilistic Inference for Prediction, ICLR 2019.
[Ravi and Beatson 19] Amortized Bayesian Meta-Learning, ICLR 2019.
[Rakelly et al. 19] Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables, ICML 2019.
[Shu et al. 19] Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting, NeurIPS 2019.
[Finn et al. 19] Online Meta-Learning, ICML 2019.
[Lee et al. 20] Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks, ICLR 2020.

[Yin et al. 20] Meta-Learning without Memorization, ICLR 2020.
[Iakovleva et al. 20] Meta-Learning with Shared Amortized Variational Inference, ICML 2020.
[Bronskill et al. 20] TaskNorm: Rethinking Batch Normalization for Meta-Learning, ICML 2020.

Continual Learning

[Rusu et al. 16] Progressive Neural Networks, arXiv preprint, 2016
[Kirkpatrick et al. 17] Overcoming catastrophic forgetting in neural networks, PNAS 2017
[Lee et al. 17] Overcoming Catastrophic Forgetting by Incremental Moment Matching, NIPS 2017
[Shin et al. 17] Continual Learning with Deep Generative Replay, NIPS 2017.
[Lopez-Paz and Ranzato 17] Gradient Episodic Memory for Continual Learning, NIPS 2017.
[Yoon et al. 18] Lifelong Learning with Dynamically Expandable Networks, ICLR 2018.
[Nguyen et al. 18] Variational Continual Learning, ICLR 2018.
[Schwarz et al. 18] Progress & Compress: A Scalable Framework for Continual Learning, ICML 2018.
[Chaudhry et al. 19] Efficient Lifelong Learning with A-GEM, ICLR 2019.

[Rao et al. 19] Continual Unsupervised Representation Learning, NeurIPS 2019.
[Rolnick et al. 19] Experience Replay for Continual Learning, NeurIPS 2019.
[Jerfel et al. 20] Reconciling Meta-Learning and Continual Learning with Online Mixtures of Tasks, NeurIPS 2019.
[Yoon et al. 20] Scalable and Order-robust Continual Learning with Additive Parameter Decomposition, ICLR 2020.
[Knoblauch et al. 20] Optimal Continual Learning has Perfect Memory and is NP-HARD, ICML 2020.
[Remasesh et al. 20] Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics, Continual Learning Workshop, ICML 2020.

Interpretable Deep Learning

[Ribeiro et al. 16] "Why Should I Trust You?" Explaining the Predictions of Any Classifier, KDD 2016
[Kim et al. 16] Examples are not Enough, Learn to Criticize! Criticism for Interpretability, NIPS 2016
[Choi et al. 16] RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism, NIPS 2016
[Koh et al. 17] Understanding Black-box Predictions via Influence Functions, ICML 2017
[Bau et al. 17] Network Dissection: Quantifying Interpretability of Deep Visual Representations, CVPR 2017
[Selvaraju et al. 17] Grad-CAM: Visual Explanation from Deep Networks via Gradient-based Localization, ICCV 2017.
[Kim et al. 18] Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), ICML 2018.
[Heo et al. 18] Uncertainty-Aware Attention for Reliable Interpretation and Prediction, NeurIPS 2018.
[Bau et al. 19] GAN Dissection: Visualizing and Understanding Generative Adversarial Networks, ICLR 2019.

[Ghorbani et al. 19] Towards Automatic Concept-based Explanations, NeurIPS 2019.
[Coenen et al. 19] Visualizing and Measuring the Geometry of BERT, NeurIPS 2019.
[Heo et al. 20] Cost-Effective Interactive Attention Learning with Neural Attention Processes, ICML 2020.
[Agarwal et al. 20] Neural Additive Models: Interpretable Machine Learning with Neural Nets, arXiv preprint, 2020.

Reliable Deep Learning

[Guo et al. 17] On Calibration of Modern Neural Networks, ICML 2017.
[Lakshminarayanan et al. 17] Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017.
[Liang et al. 18] Enhancing the Reliability of Out-of-distrubition Image Detection in Neural Networks, ICLR 2018.
[Lee et al. 18] Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples, ICLR 2018.
[Kuleshov et al. 18] Accurate Uncertainties for Deep Learning Using Calibrated Regression, ICML 2018.
[Jiang et al. 18] To Trust Or Not To Trust A Classifier, NeurIPS 2018.
[Madras et al. 18] Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer, NeurIPS 2018.
[Maddox et al. 19] A Simple Baseline for Bayesian Uncertainty in Deep Learning, NeurIPS 2019.

[Kull et al. 19] Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration, NeurIPS 2019.
[Thulasidasan et al. 19] On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks, NeurIPS 2019.
[Ovadia et al. 19] Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift, NeurIPS 2019.
[Hendrycks et al. 20] AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty, ICLR 2020.
[Filos et al. 20] Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?, ICML 2020.

Deep Adversarial Learning

[Szegedy et al. 14] Intriguing Properties of Neural Networks, ICLR 2014.
[Goodfellow et al. 15] Explaining and Harnessing Adversarial Examples, ICLR 2015.
[Kurakin et al. 17] Adversarial Machine Learning at Scale, ICLR 2017.
[Madry et al. 18] Toward Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018.
[Eykholt et al. 18] Robust Physical-World Attacks on Deep Learning Visual Classification.
[Athalye et al. 18] Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, ICML 2018.
[Zhang et al. 19] Theoretically Principled Trade-off between Robustness and Accuracy, ICML 2019.
[Carmon et al. 19] Unlabeled Data Improves Adversarial Robustness, NeurIPS 2019.
[Ilyas et al. 19] Adversarial Examples are not Bugs, They Are Features, NeurIPS 2019.
[Li et al. 19] Certified Adversarial Robustness with Additive Noise, NeurIPS 2019.
[Tramèr and Boneh 19] Adversarial Training and Robustness for Multiple Perturbations, NeurIPS 2019.
[Shafahi et al. 19] Adversarial Training for Free!, NeurIPS 2019.

[Wong et al. 20] Fast is Better Than Free: Revisiting Adversarial Training, ICLR 2020.
[Madaan et al. 20] Adversarial Neural Pruning with Latent Vulnerability Suppression, ICML 2020.
[Maini et al. 20] Adversarial Robustness Against the Union of Multiple Perturbation Models, ICML 2020.

Graph Neural Networks

[Li et al. 16] Gated Graph Sequence Neural Networks, ICLR 2016.
[Hamilton et al. 17] Inductive Representation Learning on Large Graphs, NIPS 2017.
[Kipf and Welling 17] Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017.
[Velickovic et al. 18] Graph Attention Networks, ICLR 2018.
[Ying et al. 18] Hierarchical Graph Representation Learning with Differentiable Pooling, NeurIPS 2018.
[Xu et al. 19] How Powerful are Graph Neural Networks?, ICLR 2019.
[Maron et al. 19] Provably Powerful Graph Networks, NeurIPS 2019.
[Yun et al. 19] Graph Transformer Neteworks, NeurIPS 2019.

[Loukas 20] What Graph Neural Networks Cannot Learn: Depth vs Width, ICLR 2020.
[Bianchi et al. 20] Spectral Clustering with Graph Neural Networks for Graph Pooling, ICML 2020.
[Xhonneux et al. 20] Continuous Graph Neural Networks, ICML 2020.
[Garg et al. 20] Generalization and Representational Limits of Graph Neural Networks, ICML 2020.
[Bécigneul et al. 20] Optimal Transport Graph Neural Networks, arXiv preprint 2020.

Neural Architecture Search

[Zoph and Le 17] Neural Architecture Search with Reinforcement Learning, ICLR 2017.
[Baker et al. 17] Designing Neural Network Architectures using Reinforcement Learning, ICLR 2017.
[Real et al. 17] Large-Scale Evolution of Image Classifiers, ICML 2017.
[Liu et al. 18] Hierarchical Representations for Efficient Architecture Search, ICLR 2018.
[Pham et al. 18] Efficient Neural Architecture Search via Parameters Sharing, ICML 2018.
[Luo et al. 18] Neural Architecture Optimization, NeurIPS 2018.
[Liu et al. 19] DARTS: Differentiable Architecture Search, ICLR 2019.
[Tan et al. 19] MnasNet: Platform-Aware Neural Architecture Search for Mobile, CVPR 2019.
[Cai et al. 19] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019.
[Zhou et al. 19] BayesNAS: A Bayesian Approach for Neural Architecture Search, ICML 2019.
[Tan and Le 19] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML 2019.
[Guo et al. 19] NAT: Neural Architecture Transformer for Accurate and Compact Architectures, NeurIPS 2019.
[Chen et al. 19] DetNAS: Backbone Search for Object Detection, NeurIPS 2019.
[Dong and Yang 20] NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search, ICLR 2020.
[Zela et al. 20] Understanding and Robustifying Differentiable Architecture Search, ICLR 2020.

[Such et al. 20] Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data, ICML 2020.
[Li et al. 20] Neural Architecture Search in A Proxy Validation Loss Landscape, ICML 2020.

Federated Learning

[Konečný et al. 16] Federated Optimization: Distributed Machine Learning for On-Device Intelligence, arXiv Preprint, 2016.
[Konečný et al. 16] Federated Learning: Strategies for Improving Communication Efficiency, NIPS Workshop on Private Multi-Party Machine Learning 2016.
[McMahan et al. 17] Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017.
[Smith et al. 17] Federated Multi-Task Learning, NIPS 2017.
[Li et al. 20] Federated Optimization in Heterogeneous Networks, MLSys 2020.

[Yurochkin et al. 19] Bayesian Nonparametric Federated Learning of Neural Networks, ICML 2019.
[Bonawitz et al. 19] Towards Federated Learning at Scale: System Design, MLSys 2019.
[Wang et al. 20] Federated Learning with Matched Averaging, ICLR 2020.
[Li et al. 20] On the Convergence of FedAvg on Non-IID data, ICLR 2020.

[Karimireddy et al. 20] SCAFFOLD: Stochastic Controlled Averaging for Federated Learning, ICML 2020.
[Yu et al. 20] Federated Learning with Only Positive Labels, ICML 2020.
[Hamer et al. 20] FedBoost: Communication-Efficient Algorithms for Federated Learning, ICML 2020.
[Rothchild et al. 20] FetchSGD: Communication-Efficient Federated Learning with Sketching, ICML 2020.
[Pathak and Wainwright 20] FedSplit: An Algorithminc Framework for Fast Federated Optimization, NeurIPS 2020.

Self-Supervised Learning

[Dosovitskiy et al. 14] Discriminative Unsupervised Feature Learning with Convolutional Neural Networks, NIPS 2014.
[Pathak et al. 16] Context Encoders: Feature Learning by Inpainting, CVPR 2016.
[Norrozi and Favaro et al. 16] Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, ECCV 2016.
[Gidaris et al. 18] Unsupervised Representation Learning by Predicting Image Rotations, ICLR 2018.
[He et al. 20] Momentum Contrast for Unsupervised Visual Representation Learning, CVPR 2020.
[Chen et al. 20] A Simple Framework for Contrastive Learning of Visual Representations, ICML 2020.
[Mikolov et al. 13] Efficient Estimation of Word Representations in Vector Space, ICLR 2013.
[Devlin et al. 19] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019.
[Clark et al. 20] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators, ICLR 2020.
[Hu et al. 20] Strategies for Pre-training Graph Neural Networks, ICLR 2020.

[Chen et al. 20] Generative Pretraining from Pixels, ICML 2020.
[Grill et al. 20] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, arXiv preprint, 2020.
[Chen et al. 20] Big Self-Supervised Models are Strong Semi-Supervised Learners, arXiv preprint, 2020.
[Laskin et al. 20] CURL: Contrastive Unsupervised Representations for Reinforcement Learning, ICML 2020.

junmokane/ReadingList