Fast.ai Deep Learning from the Foundations (Spring 2019)
Part II of Fast.ai's two-part deep learning course, offered through The Data Institute at USF. From March through the end of April in 2019. My Part I coursework is here.
This course offered a bottom-up approach (through code, not math equations) to becoming an expert deep learning practitioner and experimenter.
We implemented core fastai and PyTorch classes and modules from scratch, achieving similar or better performance. We also practiced coding up techniques introduced in various papers, and then spent significant time on strategies useful in decreasing model training time (parallelization, JIT). The final two weeks were spent diving deep into Swift for TensorFlow with Chris Lattner, where we saw first-hand how differentiable programming could work.
I came away with both the know-how to engineer cutting-edge deep learning ideas from scratch with optimized code, as well as the expertise necessary to research and explore new ideas of my own.
I implemented the code created by Jeremy Howard and Sylvain Gugger for the course's weekly lectures and reproduced their results. However, the bulk of my time (about 5 months, full-time study) was spent crafting my own plain-English explanations of the techniques and concepts covered by the class. I'm proudest of my writing on language model pre-training and the Swift for TensorFlow framework.
- Layer-Sequential Unit-Variance (LSUV) Weight Initialization
- Building fastai's DataBlock API from Scratch
- Improving PyTorch's Optimizers
- Image Augmentation and PyTorch JIT
- NVIDIA's DALI Batch Image Augmentation Library
- Mixup and Label Smoothing
- FP16 Training
- A Flexible & Concise XResNet Implementation
- Transfer Learning from Scratch
- A Survey of Language Model Techniques
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- Understanding the difficulty of training deep feedforward neural networks
- Fixup Initialization: Residual Learning Without Normalization
- All you need is a good init
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
- Self-Normalizing Neural Networks
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Layer Normalization
- Instance Normalization: The Missing Ingredient for Fast Stylization
- Group Normalization
- Revisiting Small Batch Training for Deep Neural Networks
- All you need is a good init
- Decoupled Weight Regularization
- L2 Regularization versus Batch and Weight Normalization
- Norm matters: efficient and accurate normalization schemes in deep networks
- Three Mechanisms of Weight Decay Regularization
- Adam: A Method for Stochastic Optimization
- Reducing BERT Pre-Training Time from 3 Days to 76 Minutes (LAMB optimizer paper)
- Going Deeper with Convolutions
- mixup: Beyond Empirical Risk Minimization
- Rethinking the Inception Architecture for Computer Vision (label smoothing is in part 7)
- Bag of Tricks for Image Classification with Convolutional Neural Networks (XResNets)
- Regularizing and Optimizing LSTM Language Models (AWD-LSTM)
- Universal Language Model Fine-tuning for Text Classification (ULMFiT)