Applied Deep Learning (YouTube Playlist)

Course Objectives & Prerequisites:

This is a two-semester-long course primarily designed for graduate students. However, undergraduate students with demonstrated strong backgrounds in probability, statistics (e.g., linear & logistic regressions), numerical linear algebra and optimization are also welcome to register. We will be pursuing the objective of familiarizing the students with state-of-the-art deep learning techniques employed in the industry. Deep learning is a field that has been witnessing a mini-revolution every few months. It is therefore very important that the students registering for this course are eager to learn new concepts. So much of deep learning is just software engineering. Consequently, the students should be able to write clean code while doing their assignments. Python will be the programming language used in this course. Familiarity with TensorFlow and PyTorch is a plus but is not a requirement. However, it is very important that the students are willing to do the hard work to learn and use these two frameworks as the course progresses.

Part I Topics (Fall Semester)

Part II Topics (Spring Semester)

References

Training Deep Neural Networks

  • An overview of gradient descent optimization algorithms

Computer Vision; Image Classification; Large Networks

  • Multi-column Deep Neural Networks for Image Classification
  • ImageNet Classification with Deep Convolutional Neural Networks (code)
  • Dropout: A Simple Way to Prevent Neural Networks from Overfitting (code)
  • Network In Network
  • Very Deep Convolutional Networks for Large-Scale Image Recognition (code)
  • Going Deeper with Convolutions
  • Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  • Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
  • Rethinking the Inception Architecture for Computer Vision
  • Training Very Deep Networks
  • Deep Residual Learning for Image Recognition (code)
  • Identity Mappings in Deep Residual Networks (code)
  • Wide Residual Networks (code)
  • Aggregated Residual Transformations for Deep Neural Networks (code)
  • Densely Connected Convolutional Networks (code)
  • Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
  • mixup: Beyond Empirical Risk Minimization (code)
  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (code)
  • Squeeze-and-Excitation Networks (code)
  • CBAM: Convolutional Block Attention Module (code)
  • Random Erasing Data Augmentation (code)
  • Spatial Transformer Networks
  • Dynamic Routing Between Capsules
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (code)
  • MLP-Mixer: An all-MLP Architecture for Vision (code)
  • High-Performance Large-Scale Image Recognition Without Normalization (code)

Computer Vision; Image Classification; Small Networks

  • Distilling the Knowledge in a Neural Network
  • Learning both Weights and Connections for Efficient Neural Networks
  • Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (code)
  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (code)
  • XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (code)
  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (code)
  • Xception: Deep Learning with Depthwise Separable Convolutions (code)
  • MobileNetV2: Inverted Residuals and Linear Bottlenecks (code)
  • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (code)

Computer Vision; Image Classification; AutoML

  • Neural Architecture Search With Reinforcement Learning (code)
  • Learning Transferable Architectures for Scalable Image Recognition
  • Regularized Evolution for Image Classifier Architecture Search (code)
  • Evolving Deep Neural Networks
  • Efficient Neural Architecture Search via Parameter Sharing (code)
  • DARTS: Differentiable Architecture Search (code)
  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (code)
  • MnasNet: Platform-Aware Neural Architecture Search for Mobile (code)

Computer Vision; Image Classification; Robustness

  • Intriguing properties of neural networks
  • Explaining and harnessing adversarial examples
  • Adversarial Examples in the Physical World
  • The Limitations of Deep Learning in Adversarial Settings
  • Practical Black-Box Attacks against Machine Learning
  • Towards Evaluating the Robustness of Neural Networks (code)
  • Towards Deep Learning Models Resistant to Adversarial Attacks (code)
  • Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (code)
  • One Pixel Attack for Fooling Deep Neural Networks

Computer Vision; Image Classification; Visualizing & Understanding

  • Visualizing and Understanding Convolutional Networks
  • Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
  • Striving for Simplicity: The All Convolutional Net
  • “Why Should I Trust You?” Explaining the Predictions of Any Classifier (code)
  • Learning Deep Features for Discriminative Localization (code)
  • Understanding Deep Learning Requires Rethinking Generalization
  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (code)
  • A Unified Approach to Interpreting Model Predictions (code)

Computer Vision; Image Classification; Transfer Learning

  • How transferable are features in deep neural networks? (code)
  • DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (code)
  • CNN Features off-the-shelf: an Astounding Baseline for Recognition
  • Return of the Devil in the Details: Delving Deep into Convolutional Nets (code)
  • Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks (code)

Computer Vision; Image Classification; Domain Adaptation

  • Domain-Adversarial Training of Neural Networks (code)
  • Adversarial Discriminative Domain Adaptation

Computer Vision; Image Classification; Few-shot Learning

  • Matching Networks for One Shot Learning
  • Prototypical Networks for Few-shot Learning (code)
  • Learning to Compare: Relation Network for Few-Shot Learning

Computer Vision; Image Classification; Federated Learning

  • Communication-Efficient Learning of Deep Networks from Decentralized Data

Computer Vision; Image Classification; Self-training & Contrastive Learning

  • Self-training with Noisy Student improves ImageNet classification (code)
  • A Simple Framework for Contrastive Learning of Visual Representations (code)
  • Momentum Contrast for Unsupervised Visual Representation Learning (code)

Computer Vision; Image Transformation; Semantic Segmentation

  • Fully Convolutional Networks for Semantic Segmentation (code)
  • Learning Deconvolution Network for Semantic Segmentation (code)
  • U-Net: Convolutional Networks for Biomedical Image Segmentation (code)
  • DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (code)
  • Multi-scale Context Aggregation by Dilated Convolutions (code)
  • SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
  • Pyramid Scene Parsing Network (code)
  • Rethinking Atrous Convolution for Semantic Image Segmentation
  • What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
  • RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation (code)
  • Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (code)
  • Dual Attention Network for Scene Segmentation (code)

Computer Vision; Image Transformation; Super-Resolution, Denoising, and Colorization

  • Learning a Deep Convolutional Network for Image Super-Resolution (code)
  • Perceptual Losses for Real-Time Style Transfer and Super-Resolution
  • Image Style Transfer Using Convolutional Neural Networks (code)
  • Accurate Image Super-Resolution Using Very Deep Convolutional Networks (code)
  • Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
  • Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (code)
  • Enhanced Deep Residual Networks for Single Image Super-Resolution (code)
  • The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (code)

Computer Vision; Pose Estimation

  • Stacked Hourglass Networks for Human Pose Estimation (code)
  • Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (code)

Computer Vision; Image Transformation; Optical Flow and Depth Estimation

  • FlowNet: Learning Optical Flow with Convolutional Networks
  • FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks (code)

Computer Vision; Object Detection; Two Stage Detectors

  • A Survey on Performance Metrics for Object-Detection Algorithms (code)
  • Rich feature hierarchies for accurate object detection and semantic segmentation (code)
  • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
  • Fast R-CNN (code)
  • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (code)
  • R-FCN: Object Detection via Region-based Fully Convolutional Networks (code)
  • Feature Pyramid Networks for Object Detection
  • Deformable Convolutional Networks (code)
  • Mask R-CNN (code)

Computer Vision; Object Detection; One Stage Detectors

  • OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks (code)
  • You Only Look Once: Unified, Real-Time Object Detection (code)
  • SSD: Single Shot MultiBox Detector (code)
  • YOLO9000: Better, Faster, Stronger (code)
  • Focal Loss for Dense Object Detection
  • Speed/Accuracy Trade-Offs For Modern Convolutional Object Detectors
  • YOLOv3: An Incremental Improvement (code)
  • End-to-End Object Detection with Transformers (code)

Computer Vision; Face Recognition and Detection

  • DeepFace: Closing the Gap to Human-Level Performance in Face Verification
  • FaceNet: A Unified Embedding for Face Recognition and Clustering
  • Deep Face Recognition
  • Deep Learning Face Attributes in the Wild
  • Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks (code)
  • A Discriminative Feature Learning Approach for Deep Face Recognition
  • ArcFace: Additive Angular Margin Loss for Deep Face Recognition (code)

Computer Vision; Video

  • 3D Convolutional Neural Networks for Human Action Recognition
  • Large-scale Video Classification with Convolutional Neural Networks (code)
  • Two-Stream Convolutional Networks for Action Recognition in Videos
  • Learning Spatiotemporal Features with 3D Convolutional Networks (code)
  • Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors (code)
  • Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (code)
  • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (code)
  • Non-local Neural Networks (code)
  • Group Normalization (code)
  • Fully-Convolutional Siamese Networks for Object Tracking (code)
  • Robust Consistent Video Depth Estimation (code)

Computer Vision; 3D

  • V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation (code)
  • PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (code)
  • PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (code)
  • Dynamic Graph CNN for Learning on Point Clouds (code)

Natural Language Processing; Word Representations

  • Linguistic Regularities in Continuous Space Word Representations
  • Distributed Representations of Words and Phrases and their Compositionality
  • Efficient Estimation of Word Representations in Vector Space (code)
  • GloVe: Global Vectors for Word Representation (code)
  • Enriching Word Vectors with Subword Information (code)

Natural Language Processing; Text Classification

  • Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (code)
  • Convolutional Neural Networks for Sentence Classification (code)
  • Distributed Representations of Sentences and Documents
  • Effective Use of Word Order for Text Categorization with Convolutional Neural Networks (code)
  • A Convolutional Neural Network for Modelling Sentences
  • A Sensitivity Analysis Of (And Practitioners' Guide To) Convolutional Neural Networks For Sentence Classification
  • Character-level Convolutional Networks for Text Classification (code)
  • Bag Of Tricks For Efficient Text Classification (code)
  • Hierarchical Attention Networks for Document Classification
  • Neural Architectures For Named Entity Recognition (code) (code)
  • Universal Language Model Fine-tuning for Text Classification (code)

Natural Language Processing; Neural Machine Translation

  • Neural Machine Translation by Jointly Learning to Align and Translate
  • Sequence to Sequence Learning with Neural Networks
  • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
  • On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
  • Effective Approaches to Attention-based Neural Machine Translation (code)
  • Neural Machine Translation Of Rare Words With Subword Units (code)
  • Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
  • Convolutional Sequence to Sequence Learning (code)
  • Attention Is All You Need (code)
  • Reformer: The Efficient Transformer (code)
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (code)

Natural Language Processing; Language Modeling

  • Deep contextualized word representations (code)
  • Improving Language Understanding by Generative Pre-Training (code)
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (code)
  • Language Models are Unsupervised Multitask Learners (code)
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (code)
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach (code)
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (code)
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding (code)
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (code)
  • Cross-lingual Language Model Pretraining (code)
  • Language Models are Few-Shot Learners (code)
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (code)
  • Pay Attention to MLPs

Multimodal Learning

  • Long-term Recurrent Convolutional Networks for Visual Recognition and Description
  • Show and Tell: A Neural Image Caption Generator
  • Deep Visual-Semantic Alignments for Generating Image Descriptions
  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (code)
  • Layer Normalization
  • Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (code)
  • Zero-Shot Text-to-Image Generation (code)

Generative Networks

  • Auto-Encoding Variational Bayes
  • Stochastic Backpropagation and Approximate Inference in Deep Generative Models
  • Generative Adversarial Nets (code)
  • Conditional Generative Adversarial Nets
  • Unsupervised representation learning with deep convolutional generative adversarial networks (code)
  • Improved Techniques for Training GANs (code)
  • InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (code)
  • Context Encoders: Feature Learning by Inpainting (code)
  • Least Squares Generative Adversarial Networks (code)
  • Image-to-Image Translation with Conditional Adversarial Networks (code)
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (code)
  • Wasserstein GAN (code)
  • Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
  • Improved Training of Wasserstein GANs (code)
  • Progressive growing of GANs for improved quality, stability, and variation (code)
  • GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (code)
  • Spectral Normalization for Generative Adversarial Networks (code)
  • High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code)
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis (code)
  • A Style-Based Generator Architecture for Generative Adversarial Networks (code)
  • Self-Attention Generative Adversarial Networks (code)
  • StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (code)
  • Analyzing and Improving the Image Quality of StyleGAN (code)

Speech & Music

  • Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs)
  • Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
  • Speech Recognition with Deep Recurrent Neural Networks
  • Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (code)
  • Towards End-to-End Speech Recognition with Recurrent Neural Networks
  • Deep Speech: Scaling up end-to-end speech recognition
  • WaveNet: A Generative Model for Raw Audio
  • LSTM: A Search Space Odyssey
  • Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
  • SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
  • Jasper: An End-to-End Convolutional Neural Acoustic Model (code)
  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (code)

Reinforcement Learning

  • Playing Atari with Deep Reinforcement Learning
  • Human-level Control through Deep Reinforcement Learning
  • Continuous Control with Deep Reinforcement Learning
  • Trust Region Policy Optimization (code)
  • Conjugate Gradient Method
  • Mastering the game of Go with deep neural networks and tree search
  • Asynchronous Methods for Deep Reinforcement Learning
  • Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (code)
  • Deep Reinforcement Learning with Double Q-Learning
  • End to End Learning for Self-Driving Cars
  • End-To-End Training Of Deep Visuomotor Policies
  • Mastering the game of Go without human knowledge
  • A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
  • Proximal Policy Optimization Algorithms
  • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (code) (code)
  • Overcoming catastrophic forgetting in neural networks
  • Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
  • Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (code)

Graph Neural Networks

  • DeepWalk: Online Learning of Social Representations (code)
  • LINE: Large-scale Information Network Embedding (code)
  • node2vec: Scalable Feature Learning for Networks (code)
  • Semi-Supervised Classification with Graph Convolutional Networks (code)
  • Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (code)
  • Inductive Representation Learning on Large Graphs (code)
  • Graph Attention Networks (code)
  • How Powerful Are Graph Neural Networks? (code)

Recommender Systems

  • Session-based Recommendations with Recurrent Neural Networks (code)
  • AutoRec: Autoencoders Meet Collaborative Filtering
  • Wide & Deep Learning for Recommender Systems
  • Neural Collaborative Filtering (code)
  • Neural Factorization Machines for Sparse Predictive Analytics (code)
  • DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
  • Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks (code)
  • Variational Autoencoders for Collaborative Filtering (code)
  • Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (code)
  • Deep Learning Recommendation Model for Personalization and Recommendation Systems (code)