Deep Learning Roadmap

My own deep learning mastery roadmap, inspired by Deep Learning Papers Reading Roadmap.

There are some customized differences:

  • not only academic papers but also blog posts, online courses, and other references are included
  • customized for my own plans - may not include RL, NLP, etc.
  • updated for 2019 SOTA

Introductory Courses

Basic CNN Architectures

  • AlexNet (2012) [paper]
    • Alex Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks"
  • ZFNet (2013) [paper]
    • Zeiler et al. "Visualizing and Understanding Convolutional Networks"
  • VGG (2014)
    • Simonyan et al. "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014) [Google DeepMind & Oxford's Visual Geometry Group (VGG)] [paper]
    • VGG-16: Zhang et al. "Accelerating Very Deep Convolutional Networks for Classification and Detection" [paper]
  • GoogLeNet, a.k.a Inception v.1 (2014) [paper]
    • Szegedy et al. "Going Deeper with Convolutions" [Google]
    • Original LeNet page from Yann LeCun's homepage.
    • Inception v.2 and v.3 (2015) Szegedy et al. "Rethinking the Inception Architecture for Computer Vision" [paper]
    • Inception v.4 and InceptionResNet (2016) Szegedy et al. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" [paper]
    • "A Simple Guide to the Versions of the Inception Network" [blogpost]
  • ResNet (2015) [paper]
    • He et al. "Deep Residual Learning for Image Recognition"
  • Xception (2016) [paper]
    • Chollet, Francois - "Xception: Deep Learning with Depthwise Separable Convolutions"
  • MobileNet (2016) [paper]
    • Howard et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications"
    • A nice paper about reducing CNN parameter sizes while maintaining performance.
  • DenseNet (2016) [paper]
    • Huang et al. "Densely Connected Convolutional Networks"

Generative adversarial networks

  • GAN (2014.6) [paper]
    • Goodfellow et al. "Generative Adversarial Networks"
  • DCGAN (2015.11) [paper]
    • Radford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"
  • Info GAN (2016.6) [paper]
    • Chen et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
  • Improved Techinques for Training GANs (2016.6) [paper]
    • Salimans et al. "Improved Techinques for Training GANs"
    • This paper suggests multiple GAN training techinques such as feautre matching, minibatch discrimination, one sided label smoothing, virtual batch normalization.
    • It also suggests a renown generator performance metric, called the inception score.
  • f-GAN (2016.6) [paper]
    • Nowozin et al. "f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization"
  • Unrolled GAN (2016.7) [paper]
    • Metz et al. "Unrolled Generative Adversarial Networks"
  • ACGAN (2016.10) [paper]
    • Odena et al. "Conditional Image Synthesis With Auxiliary Classifier GANs"
  • LSGAN (2016.11) [paper]
    • Mao et al. "Least Squares Generative Adversarial Networks"
  • Pix2Pix (2016.11) [paper]
    • Isola et al. "Image-to-Image Translation with Conditional Adversarial Networks"
  • EBGAN (2016.11) [paper]
    • Zhao et al. "Energy-based Generative Adversarial Network"
  • WGAN (2017.4) [paper]
    • Arjovsky et al., "Wasserstein GAN"
  • WGAN_GP (2017.5) [paper]
    • Gulrajani et al., "Improved Training of Wasserstein GANs"
    • Improves the training stability by applying "gradient penalty (GP)" to the loss function
  • BEGAN (2017.5) [paper]
    • Berthelot et al. "BEGAN: Boundary Equilibrium Generative Adversarial Networks"
    • Introduces a diversity ratio, or an equilibrium constant that controls the variety - quality tradeoff, and also proposes a convergence measure using it.
  • CycleGAN (2017.5) [paper]
    • DiscoGAN (2017.5) [paper]
    • DiscoGAN and CycleGAN proposes the EXACT SAME learning techniques for style transfer task using GAN, developed independently at the same time.
  • Frechet Inception Distance (FID) (2017.6) [paper]
    • Heusel et al. "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
    • The paper's main contribution is a technique called Two Time-Scale Update Rule (TTSU), but it is mostly known for the distance metric called Frechet Inception Distance that measures the distance between two distributions of activation values.
  • ProGAN (2017.10) [paper]
    • Karras et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation"
  • PacGAN (2017.12) [paper]
    • Higgins et al. "PacGAN: The power of two samples in generative adversarial networks"
  • BigGAN (2018) [paper]
  • GauGAN (2019.3) [paper]
    • Park et al. "Semantic Image Synthesis with Spatially-Adaptive Normalization"

Advanced GANs

  • DRAGAN (2017.5) [paper]
    • Kodali et al. "On Convergence and Stability of GANs"
  • Are GANs Created Equal? (2017.11) [paper]
    • Lucic et al. "Are GANs Created Equal? A Large-Scale Study"
  • SGAN (2017.12) [paper]
    • Chavdarova et al. "SGAN: An Alternative Training of Generative Adversarial Networks"
  • MaskGAN (2018.1) [paper]
    • Fedus et al. "MaskGAN: Better Text Generation via Filling in the _____"
  • Spectral Normalization (2018.2) [paper]
    • Miyato et al. "Spectral Normalization for Generative Adversarial Networks"
  • SAGAN (2018.5) [paper] [tensorflow]
    • Zhang et al. "Self-Attention Generative Adversarial Networks"
  • Unusual Effectiveness of Averaging in GAN Training (2018) [paper]
    • "Benefitting from training on past snapshots."
    • Uses exponential moving averaging (EMA)
  • Disconnected Manifold Learning (2018.6) [paper]
    • Khayatkhoei, et al. "Disconnected Manifold Learning for Generative Adversarial Networks"
  • A Note on the Inception Score (2018.6) [paper]
    • Barratt et al., "A Note on the Inception Score"
  • Which Training Methods for GAN do actually converge? (2018.7) [paper]
    • Mescheder et al., "Which Training Methods for GANs do actually Converge?"
  • GAN Dissection (2018.11) [paper]
    • Bau et al. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks"
  • Improving Generalization and Stability for GANs (2019.2) [paper]
    • Thanh-Tung et al., "Improving Generalization and Stability of Generative Adversarial Networks"
  • Augustus Odena - "Open Questions about GANs" (2019.4) [distill.pub]
    • Very nice article about current state of GAN research and discusses problems yet to be solved.

Autoencoders

  • Original autoencoder (1986) [paper]
    • Rumelhart, Hinton, and Williams, "Learning Internal Representations by Error Propagation"
  • AutoEncoder [science]
    • Hinton et al., "Reducing the Dimensionality of Data with Neural Networks"
  • Denoising Autoencoders (2008) [paper]
    • Vincent et al. "Extracting and Composing Robust Features with Denoising Autoencoders"
  • Wasserstein Autoencoder (2017) [paper]
    • Tolstikhin et al. "Wasserstein Auto Encoders"

Autoregressive models

  • PixelCNN (2016) [paper]
    • van den Oord et al. "Conditional image generation with PixelCNN decoders."
  • WaveNet (2016) [paper]
    • van den Oord et al. "WaveNet: A Generative Model for Raw Audio"
  • tacotron?

Layer Normalizations

  • Batch Normalization (2015.2) [paper]
    • Ioeffe et al. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
  • Group Norm
  • Instance Normalization (2016.7) [paper]
    • Ulyanov et al. "Instance Normalization: The Missing Ingredient for Fast Stylization"
  • Santurkar et al. "How does Batch Normalization help Optimization?" (2018.5) [paper]
  • Switchable Normalization (2019) [paper]
    • Luo et al. "Differentiable Learning-to-Normalize via Switchable Normalization"
  • Weight Standardization (2019.3) [paper]
    • Qiao et al. "Weight Standardization"

Initializations

  • Xavier Initialization (2010) [paper]
    • Glorot et al., "Understanding the difficulty of training deep feedforward neural networks"
  • Kaiming (He) Initialization (2015.2) [paper]
    • He et al., "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification"
  • All you need is a good init (2015.11) [paper]
    • Mishkin et al., "All you need is a good init"
  • All you need is beyond a good init (2017.4) [paper]
    • Xie et al. "All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation"

Dropouts

  • Dropout (2014) [paper]
    • Srivastava et al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
  • Inverted Dropouts [notes on CS231n]
    • Multiplying the inverted keep_prob value on training so that values during inference (or testing) is consistent.
  • Li et al., "Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift" (2018.1) [paper]

Meta-Learning / Representation Learning (Zero-Shot learning, Few-Shot learning)

  • Zero-Data Learning (2008) [paper]
    • Larochelle et al., "Zero-data Learning of New Tasks"
  • Palatucci et al., "Zero-shot Learning with Semantic Output Codes" (NIPS 2009) [paper]
  • Socher et al., "Zero-Shot Learning Through Cross-Modal Transfer" (2013.1) [paper]
  • Lampert et al., "Attribute-Based Classification for Zero-Shot Visual Object Categorization" (2013.7) [paper]
  • Dinu et al., "Improving zero-shot learning by mitigating the hubness problem" (2014.12) [paper]
  • Romera-Paredes et al. - "An embarrassingly simple approach to zero-shot learning" (2015) [paper]
  • Prototypical Networks (2017.3) [paper]
    • Snell et al., "Prototypical Networks for Few-shot Learning"
  • Zero-shot learning - the Good, the Bad and the Ugly" (2017.3) [paper]
    • Xian et al., "Zero-Shot Learning - The Good, the Bad and the Ugly"
  • In defence of the Triplet Loss (2017.3) [paper]
    • Hermans et al., "In Defense of the Triplet Loss for Person Re-Identification"
  • MAML (2017.3) [paper]
    • Finn et al, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
  • Triplet Loss and Online Triplet Mining in Tensorflow (2018.3) [Oliver Moindrot Blog]
  • Few-Shot learning Survey (2019.4) [paper]
    • Wang et al. "Few-shot Learning: A Survey"

Transfer learning

  • Survey 2018 (2018) [paper]
    • Tan et al. "A Survey on Deep Transfer Learning"

Geometric learning

  • Geometric Deep Learning (2016) [paper]
    • Bronstein et al. "Geometric deep learning: going beyond Euclidean data"

Variational Autoencoders (VAE)

  • VQ-VAE (2017.11) [paper]
    • van den Oord et al., "Neural Discrete Representation Learning"
  • Semi-Amortized Variational Autoencoders (2018.2) [paper]
    • Kim et al. "Semi-Amortized Variational Autoencoders"

Object detection

Semantic Segmentation

Sequential Model

  • Seq2Seq (2014) [paper]
    • Sutskever et al. "Sequence to sequence learning with neural networks."

Neural Turing Machine

  • Neural Turing Machines (2014) [paper]
    • Graves et al., "Neural turing machines."
  • Pointer Networks (2015) [paper]]
    • Vinyals et al., "Pointer networks."

Attention / Question-Answering

  • NMT (Neural Machine Translation) (2014) [paper]
    • Bahdanau et al, "Neural Machine Translation by Jointly Learning to Align and Translate"
  • Stanford Attentive Reader (2016.6) [paper]
    • Chen et al. "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
  • BiDAF (2016.11) [paper]
    • Seo et al. "Bidirectional Attention Flow for Machine Comprehension"
  • DrQA or Stanford Attentive Reader++ (2017.3) [paper]
    • Chen et al. "Reading Wikipedia to Answer Open-Domain Questions"
  • Transformer (2017.8) [paper] [google ai blog]
    • Vaswani et al. "Attention is all you need"
  • [read] Lilian Weng - "Attention? Attention!" (2018) [blog_post]
    • A nice explanation of attention mechanism and its concepts.
  • BERT (2018.10) [paper]
    • Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
  • GPT-2 (2019) [paper (pdf)]
    • Radford et al. "Language Models are Unsupervised Multitask Learners"

Advanced RNNs

Model Compression

  • MobileNet (2016) (see above: Basic CNN Architectures)
  • ShuffleNet (2017)
    • Zhang et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"

Neural Processes

  • Neural Processes (2018) [paper]
    • Garnelo et al. "Neural Processes"
  • Attentive Neural Processes (2019) [paper]
    • Kim et al. "Attentive Neural Processes"
  • A Visual Exploration of Gaussian Processes (2019) [Distill.pub]
    • Not a neural process, but gives very nice intuition about Gaussian Processes. Good Read.

Self-supervised learning

Data Augmentation

  • Shake Shake Regularization (2017.5) [paper]
    • Gastaldi, Xavier - "Shake-Shake Regularization"

Interpretation and Theory on Generalization, Overfitting, and Learning Capacity

  • MDL (Minimum Description Length)
    • Peter Grunwald - "A tutorial introduction to the minimum description length principle" (2004) [paper]
  • Grunwald et al., - "Shannon Information and Kolmogorov Complexity" (2010) [paper]
  • Dauphin et al. "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization" (2014.6) [paper]
  • Choromanska et al. "The Loss Surfaces of Multilayer Networks" (2014.11) [paper]
    • argues that non-convexity in NNs are not a huge problem
  • Knowledge Distillation (2015.3) [paper]
    • Hinton et al., "Distilling the Knowledge in a Neural Network"
  • 3-Part Learning Theory by Mostafa Samir
  • Deconvolution and Checkerboard Artifacts - Odena (2016) [distill.pub article]
  • Keskar et al. "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima" (2016.9) [paper]
  • Rethinking Generalization (2016.11) [paper]
    • Zhang et al. "Understanding deep learning requires rethinking generalization"
  • Information Bottleneck (2017) [paper] [original paper on information bottleneck (2000)] [youtube-talk] [article in quantamagazine]
    • Shwartz-Ziv and Tishby, "Opening the Black Box of Deep Neural Networks via Information"
  • Neyshabur et al, "Exploring Generalization in Deep Learning" (2017.7) [paper]
  • Sun et al., "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era" (2017.7) [paper]
  • Super-Convergence (2017.8) [paper]
    • Smith et al. - "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates"
  • Don't Decay the Learning Rate, Increase the Batch Size (2017.11) [paper]
    • Smith et al. "Don't Decay the Learning Rate, Increase the Batch Size"
  • Hestness et al. "Deep Learning Scaling is Predictable, Empirically" (2017.12) [paper]
  • Visualizing loss landscape of neural nets (2018) [paper]
  • Olson et al., "Modern Neural Networks Generalize on Small Data Sets" (NeurIPS 2018) [paper]
  • Lottery Ticket Hypothesis (2018.3) [paper]
    • Frankle et al., "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks"
    • Empirically showed that zeroing small weights after training, rewinding except zeroed wegiths, and then re-triaining with 'pruned' weights showed even better results.
  • Intrinsic Dimension (2018.4) [paper]
    • Li et al., "Measuring the Intrinsic Dimension of Objective Landscapes"
  • Geirhos et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" (2018.11) [paper]
  • Belkin et al. "Reconciling modern machine learning and the bias-variance trade-off" (2018.12) [paper]
  • Graetz - "How to visualize convolution features in 40 lines of code" (2019) [medium]
  • Geiger et al. "Scaling description of generalization with number of parameters in deep learning" (2019.1) [paper]
  • Are all layers created equal? (2019.2) [paper]
    • Zhang et al. "Are all layers created equal?"
  • Lilian Weng - "Are Deep Neural Networks Dramatically Overfitted?" (2019.4) [lil'log]
    • Excellent article about generalization and overfitting of deep neural networks

Adversarial Attacks and Defense against attacks (RobustML)

  • RobustML site
  • Adversarial Examples Szegedy et al. - Intreguing Properties of Neural Networks (2013.12) [paper]
    • induces missclassification by applying small perturbations
    • this paper was the first to coin the term "Adversarial Example"
  • Fast Gradient Sign Attack (FGSM) (2014.12)
    • Goodfellow et al., "Explaining and Harnessing Adversarial Examples" (ICLR 2015) [paper]
    • This paper presented the famous "panda example" (as also seen in pytorch tutorial)
  • Kurakin et al., "Adversarial Machine Learning at Scale" (2016.11) [paper]
  • Mandry et al., "Towards Deep Learning Models Resistant to Adversarial Attacks" (2017.6) [paper]
  • Carlini et al., "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text" (2018.1) [paper]

Neural architecture search (NAS) and AutoML

  • GREAT AutoML Website [site]
    • They maintain a blog, a list of NAS literatures, analysis page, and a web book.
  • AdaNet (2016.7) [paper] [GoogleAI blog]
    • Cortes et al. "AdaNet: Adaptive Structural Learning of Artificial Neural Networks"
  • NAS (2016.12) [paper]
    • Zoph et al. "Neural Architecture Search with Reinforcement Learning"
  • PNAS (2017.12) [paper]
    • Liu et al. "Progressive Neural Architecture Search"
  • ENAS (2018.2) [paper]
    • Pham et al. "Efficient Neural Architecture Search via Parameter Sharing"
  • DARTS (2018.6) [paper]
    • Liu et al. "DARTS: Differentiable Architecture Search"
    • Uses a continuous relaxation over the discrete neural architecture space.
  • RandWire (2019) [paper]
    • Xie et al. "Exploring Randomly Wired Neural Networks for Image Recognition" [Facebook AI Research]
  • A Survey on Neural Architecture Search (2019) [paper]
    • Witsuba et al., "A Survey on Neural Architecture Search"

Practical Techniques

DL roadmap reference

Theory

Resources

  • A Selective Overview of Deep Learning (2019) [paper]
    • Fan et al. "A Selective Overview of Deep Learning"
    • A nice overview paper on deep learning up to early 2019 (about 30 pages)