CS231n: Convolutional Neural Networks for Visual Recognition (Spring 2017)

cs231n learning notes

Website: Convolutional Neural Networks for Visual Recognition (Spring 2017)

Video: CS231n Spring 2017

Course Syllabus

Lecture 1: Course Introduction [done!!!]

slides [done!!!]

Computer vision overview
Historical context
Course logistics

video [done!!!]

Lecture 2: Image Classification [done!!!]

slides [done!!!]

The data-driven approach
K-nearest neighbor
Linear classification I

video [done!!!]

python/numpy tutorial [done!!!]

image classification notes [done!!!]

Intro to Image Classification, data-driven approach, pipeline
Nearest Neighbor Classifier
- k-Nearest Neighbor
Validation sets, Cross-validation, hyperparameter tuning
Pros/Cons of Nearest Neighbor
- accelerate the nearest neighbor lookup in a dataset (e.g. FLANN)
- a visualization technique called t-SNE
Summary
Summary: Applying kNN in practice
- If your data is very high-dimensional, consider using a dimensionality reduction technique such as PCA(wiki ref, CS229ref, blog ref )or even Random Projections.
Further Reading

Here are some (optional) links you may find interesting for further reading:
- A Few Useful Things to Know about Machine Learning, where especially section 6 is related but the whole paper is a warmly recommended reading.
- Recognizing and Learning Object Categories, a short course of object categorization at ICCV 2005.

linear classification notes [done!!!]

Intro to Linear classification
Linear score function
Interpreting a linear classifier
Loss function
- Multiclass SVM
- - For example, it turns out that including the L2 penalty leads to the appealing max margin property in SVMs (See CS229 lecture notes for full details if you are interested).
- Softmax classifier
- SVM vs Softmax
Interactive Web Demo of Linear Classification
Summary
Further Reading

These readings are optional and contain pointers of interest.
- Deep Learning using Linear Support Vector Machines from Charlie Tang 2013 presents some results claiming that the L2SVM outperforms Softmax.

Lecture3 : Loss Functions and Optimization [done!!!]

slides [done!!!]

Linear classification II
Higher-level representations, image features
Optimization, stochastic gradient descent

video [done!!!]

linear classification notes [done!!!]

same to Lecture2: linear classification notes

Linear Classification Loss Visualization Demo

optimization notes [done!!!]

Introduction
Visualizing the loss function
- a Stanford class on the topic convex optimization (other project)
- Subderivative
Optimization
- Strategy #1: Random Search
- Strategy #2: Random Local Search
- Strategy #3: Following the gradient
Computing the gradient
- Numerically with finite differences
- Analytically with calculus
Gradient descent
Summary

Lecture4: Introduction to Neural Networks [done!!!]

slides [done!!!]

Backpropagation
Multi-layer Perceptrons
The neural viewpoint

video [done!!!]

backprop notes [done!!!]

Introduction
Simple expressions, interpreting the gradient
Compound expressions, chain rule, backpropagation
Intuitive understanding of backpropagation
Modularity: Sigmoid example
Backprop in practice: Staged computation
Patterns in backward flow
Gradients for vectorized operations
- Vector, Matrix, and Tensor Derivatives
Summary
References
- Automatic differentiation in machine learning: a survey

linear backprop example [done!!!]

derivatives notes (optional) [done!!!]

Efficient BackProp (optional) [done!!!]

Related (optional) [done!!!]

Lecture5： Convolutional Neural Networks [done!!!]

slides [done!!!]

History
Convolution and pooling
ConvNets outside vision

video [done!!!]

ConvNet notes [done!!!]

Architecture Overview
ConvNet Layers
- Convolutional Layer
  - The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3].
  - However, the benefit is that there are many very efficient implementations of Matrix Multiplication that we can take advantage of (for example, in the commonly used BLAS API).
  - As an aside, several papers use 1x1 convolutions, as first investigated by Network in Network.
  - A recent development (e.g. see paper by Fisher Yu and Vladlen Koltun) is to introduce one more hyperparameter to the CONV layer called the dilation.
- Pooling Layer
  - Many people dislike the pooling operation and think that we can get away without it. For example, Striving for Simplicity: The All Convolutional Net proposes to discard the pooling layer in favor of architecture that only consists of repeated CONV layers.
- Normalization Layer
  - For various types of normalizations, see the discussion in Alex Krizhevsky’s cuda-convnet library API.
- Fully-Connected Layer
- Converting Fully-Connected Layers to Convolutional Layers
  - An IPython Notebook on Net Surgery shows how to perform the conversion in practice, in code (using Caffe)
ConvNet Architectures
- Layer Patterns
  - You should rarely ever have to train a ConvNet from scratch or design one from scratch. I also made this point at the Deep Learning school.
- Layer Sizing Patterns
- Case Studies (LeNet / AlexNet / ZFNet / GoogLeNet / VGGNet)
  - LeNet (LeNet)
  - AlexNet (AlexNet, ImageNet ILSVRC challenge)
  - ZF Net (ZF Net)
  - GoogLeNet (Szegedy et al, Inception-v4)
  - VGGNet (VGGNet, pretrained model)
  - ResNet (Residual Network, batch normalization, some recent experiments, Kaiming’s presentation (video, slides), Kaiming He et al. Identity Mappings in Deep Residual Networks (published March 2016))
- Computational Considerations
Additional References
- Soumith benchmarks for CONV performance
- ConvNetJS CIFAR-10 demo allows you to play with ConvNet architectures and see the results and computations in real time, in the browser.
- Caffe, one of the popular ConvNet libraries.
- State of the art ResNets in Torch7

Lecture 6: Training Neural Networks, part I [done!!!]

slides [done!!!]

Activation functions, initialization, dropout, batch normalization

video [done!!!]

Neural Nets notes 1 [done!!!]

Quick intro without brain analogies
Modeling one neuron
Biological motivation and connections
Single neuron as a linear classifier
Commonly used activation functions
- Tanh, Krizhevsky et al
- Leaky ReLU, Delving Deep into Rectifiers
- Maxout, One relatively popular choice is the Maxout neuron (introduced recently by Goodfellow et al.)
Neural Network architectures
Layer-wise organization
Example feed-forward computation
Representational power
- see Approximation by Superpositions of Sigmoidal Function from 1989 (pdf), or this intuitive explanation from Michael Nielsen
- much more involved and a topic of much recent research. If you are interested in these topics we recommend for further reading: - [x] Deep Learning book in press by Bengio, Goodfellow, Courville, in particular Chapter 6.4. - [x] Do Deep Nets Really Need to be Deep? - [x] FitNets: Hints for Thin Deep Nets
Setting number of layers and their sizes
- but some attempts to understand these objective functions have been made, e.g. in a recent paper The Loss Surfaces of Multilayer Networks.
Summary
Additional references
- deeplearning.net tutorial with Theano
- ConvNetJS demos for intuitions
- Michael Nielsen’s tutorials

Neural Nets notes 2 [done!!!]

Setting up the data and the model
- Data Preprocessing
  - Principal Component Analysis (PCA)
- Weight Initialization
  - Understanding the difficulty of training deep feedforward neural networks
  - Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- Batch Normalization
  - Batch Normalization
- Regularization (L2/L1/Maxnorm/Dropout)
  - Dropout, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Recommended further reading for an interested reader includes:
  - Dropout paper by Srivastava et al. 2014.
  - Dropout Training as Adaptive Regularization: “we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix”.
Loss functions
Summary

Neural Nets notes 3 [done!!!]

Gradient checks
- Stick around active range of floating point. It’s a good idea to read through “What Every Computer Scientist Should Know About Floating-Point Arithmetic”
Sanity checks
Babysitting the learning process
- Loss function
- Train/val accuracy
- Weights:Updates ratio
- Activation/Gradient distributions per layer
- Visualization
Parameter updates
- First-order (SGD), momentum, Nesterov momentum
  - We recommend this further reading to understand the source of these equations and the mathematical formulation of Nesterov’s Accelerated Momentum (NAG): - [x] Advances in optimizing Recurrent Networks by Yoshua Bengio, Section 3.5. - [x] Ilya Sutskever’s thesis (pdf) contains a longer exposition of the topic in section 7.2
- Annealing the learning rate
- Second-order methods
  - Additional references: - [x] Large Scale Distributed Deep Networks is a paper from the Google Brain team, comparing L-BFGS and SGD variants in large-scale distributed optimization. - [x] SFO algorithm strives to combine the advantages of SGD with advantages of L-BFGS.
- Per-parameter adaptive learning rates (Adagrad, RMSProp)
  - Adagrad , is an adaptive learning rate method originally proposed by Duchi et al.
  - RMSprop, everyone who uses this method in their work currently cites slide 29 of Lecture 6 of Geoff Hinton’s Coursera class.
  - Adam, Adam is a recently proposed update that looks a bit like RMSProp with momentum.
  - Unit Tests for Stochastic Optimization proposes a series of tests as a standardized benchmark for stochastic optimization.
Hyperparameter Optimization
- Prefer random search to grid search. As argued by Bergstra and Bengio in Random Search for Hyper-Parameter Optimization, “randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid”
Evaluation
- Model Ensembles
Summary
Additional References
- SGD tips and tricks from Leon Bottou
- Efficient BackProp (pdf) from Yann LeCun
- Practical Recommendations for Gradient-Based Training of Deep Architectures from Yoshua Bengio

tips/tricks(optional) [done!!!]

Assignment #1 due [done!!!]

k-Nearest Neighbor classifier [done!!!]

Training a Support Vector Machine [done!!!]

Implement a Softmax classifier [done!!!]

Two-Layer Neural Network [done!!!]

Higher Level Representations: Image Features [done!!!]

Lecture 7： Training Neural Networks, part II [done!!!]

slides [done!!!]

video [done !!!]

Neural Nets notes 3 (same as the Lecture 6) [done!!!]

Lecture 8: Deep Learning Software [done!!!]

slides [done!!!]

Programming GPUs

Udacity: Intro to Parallel Programming

video [done!!!]

Lecture 9: CNN Architectures [done!!! papers need to read]

slides [done!!!]

video [done!!!]

Lecture 10 : Recurrent Neural Networks [done!!! papers need to read]

slides [done!!!]

video [done!!!]

DL book RNN chapter

Related materials

Code: min-char-rnn
Code: char-rnn
Code: neuraltalk2
Blog: Understanding LSTM Networks

Assignment #2 [done!!!]

Q1: Fully-connected Neural Network [done!!!]

Q2: Batch Normalization [done!!!]

Q3: Dropou [done!!!]

Q4: Convolutional Networks [done!!!]

Q5: PyTorch on CIFAR-10 、TensorFlow on CIFAR-10 [done!!!]

Lecture 11 : Detection and Segmentation [done!!! papers need to read]

slides [done!!!]

Semantic Segmentation Idea: Sliding Window

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013
Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

!!! Problem: Very inefficient! Not reusing shared features between overlapping patches

Semantic Segmentation Idea: Fully Convolutional

Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network!

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Classification + Localization : Multitask Loss

Toshev and Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks”, CVPR 2014

Treat localization as a regression problem!

Object Detection as Classification: Sliding Window

Problem: Need to apply CNN to huge number of locations and scales, very computationally expensive!

R-CNN: Region Proposals

Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014.

Fast R-CNN

Girshick, “Fast R-CNN”, ICCV 2015.

Faster R-CNN

Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015

Detection without Proposals: YOLO / SSD

Redmon et al, “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
Liu et al, “SSD: Single-Shot MultiBox Detector”, ECCV 2016

Object Detection: Lots of variables ...

Huang et al, “Speed/accuracy trade-offs for modern convolutional object detectors”, CVPR 2017

Aside: Object Detection + Captioning = Dense Captioning

Mask R-CNN !!!

He et al, “Mask R-CNN”, arXiv 2017

Video [done!!!]

Lecture 12: Visualizing and Understanding [done!!! papers need to read]

slides [done!!!]

DeepDream

neural-style

fast-neural-style

First Layer: Visualize Filters

Krizhevsky, “One weird trick for parallelizing convolutional neural networks”, arXiv 2014 He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016 Huang et al, “Densely Connected Convolutional Networks”, CVPR 2017

Last Layer: Nearest Neighbors、 Dimensionality Reduction

Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.

Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008

Visualizing Activations

Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.

Occlusion Experiments

Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014

Saliency Maps

Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.

Visualizing CNN features: Gradient Ascent

Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.

Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.

Nguyen et al, “Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks”, ICML Visualization for Deep Learning Workshop 2016.

Fooling Images / Adversarial Examples

(1) Start from an arbitrary image

(2) Pick an arbitrary class

(3) Modify the image to maximize the class

(4) Repeat until network is fooled

DeepDream: Amplify existing features

Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural Networks”, Google Research Blog.

Feature Inversion

Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015

Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016. Copyright Springer, 2016.

Neural Texture Synthesis

Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015

Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016. Copyright Springer, 2016.

Neural Style Transfer

Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016.

Gatys, Ecker, and Bethge, “Texture Synthesis Using Convolutional Neural Networks”, NIPS 2015

Gatys, Ecker, and Bethge, “Image style transfer using convolutional neural networks”, CVPR 2016 Figure adapted from Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016.

Ulyanov et al, “Texture Networks: Feed-forward Synthesis of Textures and Stylized Images”, ICML 2016

Dumoulin, Shlens, and Kudlur, “A Learned Representation for Artistic Style”, ICLR 2017

video [done!!!]

Lecture 13: Generative Models [done!!! papers need to read]

slides [done!!!]

Overview

Unsupervised Learning
Generative Models

○ PixelRNN and PixelCNN

○ Variational Autoencoders (VAE)

○ Generative Adversarial Networks (GAN)

Supervised vs Unsupervised Learning

PixelRNN and PixelCNN

PixelRNN

Pixel Recurrent Neural Networks

PixelCNN

Conditional Image Generation with PixelCNN Decoders

Variational Autoencoders (VAE)

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Generative Adversarial Networks

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014

Generative Adversarial Nets: Convolutional Architectures

Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

See also: https://github.com/soumith/ganhacks for tips and tricks for trainings GANs

Recap

video [done!!!]

Lecture 14: Deep Reinforcement Learning [done!!!]

slides [done!!!]

reinforcement-learning-an-introduction

LoserSun/cs231n-study-schedule

CS231n: Convolutional Neural Networks for Visual Recognition (Spring 2017)

Lecture 1: Course Introduction [done!!!]

slides [done!!!]

video [done!!!]

Lecture 2: Image Classification [done!!!]

slides [done!!!]

video [done!!!]

python/numpy tutorial [done!!!]

image classification notes [done!!!]

linear classification notes [done!!!]

Lecture3 : Loss Functions and Optimization [done!!!]

slides [done!!!]

video [done!!!]

linear classification notes [done!!!]

optimization notes [done!!!]

Lecture4: Introduction to Neural Networks [done!!!]

slides [done!!!]

video [done!!!]

backprop notes [done!!!]

linear backprop example [done!!!]

derivatives notes (optional) [done!!!]

Efficient BackProp (optional) [done!!!]

Related (optional) [done!!!]

Lecture5： Convolutional Neural Networks [done!!!]

slides [done!!!]

video [done!!!]

ConvNet notes [done!!!]

Lecture 6: Training Neural Networks, part I [done!!!]

slides [done!!!]

video [done!!!]

Neural Nets notes 1 [done!!!]

Neural Nets notes 2 [done!!!]

Neural Nets notes 3 [done!!!]

tips/tricks(optional) [done!!!]

Assignment #1 due [done!!!]

k-Nearest Neighbor classifier [done!!!]

Training a Support Vector Machine [done!!!]

Implement a Softmax classifier [done!!!]

Two-Layer Neural Network [done!!!]

Higher Level Representations: Image Features [done!!!]

Lecture 7： Training Neural Networks, part II [done!!!]

slides [done!!!]

video [done !!!]

Neural Nets notes 3 (same as the Lecture 6) [done!!!]

Lecture 8: Deep Learning Software [done!!!]

slides [done!!!]

video [done!!!]

Lecture 9: CNN Architectures [done!!! papers need to read]

slides [done!!!]

Architectures Cases

Comparison

Other architectures

video [done!!!]

Lecture 10 : Recurrent Neural Networks [done!!! papers need to read]

slides [done!!!]

video [done!!!]

DL book RNN chapter

Related materials

Assignment #2 [done!!!]

Q1: Fully-connected Neural Network [done!!!]

Q2: Batch Normalization [done!!!]

Q3: Dropou [done!!!]

Q4: Convolutional Networks [done!!!]

Q5: PyTorch on CIFAR-10 、TensorFlow on CIFAR-10 [done!!!]

Lecture 11 : Detection and Segmentation [done!!! papers need to read]

slides [done!!!]

Semantic Segmentation Idea: Sliding Window

Semantic Segmentation Idea: Fully Convolutional

Classification + Localization : Multitask Loss

Object Detection as Classification: Sliding Window

R-CNN: Region Proposals

Fast R-CNN

Faster R-CNN

Detection without Proposals: YOLO / SSD

Object Detection: Lots of variables ...

Mask R-CNN !!!

Video [done!!!]

Lecture 12: Visualizing and Understanding [done!!! papers need to read]

slides [done!!!]