/modern-computer-vision

My highly visual course to introduce the mathematics behind the modern computer vision. From linear classifiers to convnets.

Primary LanguageJupyter NotebookMIT LicenseMIT

Modern-Computer-Vision

PT/BR Notebooks to introduce neural nets as parametric non-linear function approximators in modern computer vision

Environment Setup

You can use any Jupyter you want, like from Anaconda, Jupyter Notebook itself, Kaggle, Google Colab.

I use Jupyter notebook extension on vscode.

Table of contents

  1. Sine approximation
  2. Softmax Classifier
  3. Two Layer Neural Net
  4. Airi
  5. Airi ConvNet
  6. Pytorch Convnet
  7. Deep Convnet (VGG)
  8. Pytorch Image Classification

Sine approximation

Although simple and intuitive,the sine function can't be easily formalized with an equation like f(x)= x². The sine values are given by the Y position of a line from the center of a circle to any angle, and by that, we could always check the sine and cossine values with a method as simple as taking a ruler and measuring it yourself in a circle with radius 1.

In this notebook, we explore a way of describing such simple function but as an optimization problem, and we approximate it using simple and small matrices (1x16, 16x16 and 16x1) using backpropagation and Mean Squared Error.

Final result:

image


2. Softmax Classifier (Linear classifier)

We'll take the optimization approach for vectors instead of scalars, and approach image classification as a problem of optimization.

In this notebook we'll learn common procedures and good pratices, such as dataset pre processing, weight initialization. We'll also understand the process of backpropagation as staged computation using the chain rule, that way we'll be in the correct way of thinking deep learning models as DAGs(Directed Acyclic Graph). We'll be using use softmax to create a linear classifier for the CIFAR-10 Dataset and reach close to 40% accuracy on it using Negative Log Likelihood as our loss function! We visulize its neurons at the end of the notebook so we can understand what "learning" means for a Neural Net.

We see how the simplest pipeline for image classification as an optimization problem looks like:

image

Learn how it looks like a graph and how to calculate its partial derivatives:

image

And we also visualize what happens inside the neurons when we add the gradient, and what happens when we apply gradient descent, visually and numerically:

image

image

we visually the neurons after the learning process and interpret it:

image


Two Layer Neural Net

In this notebook we upgrade our Softmax classifier by adding one more layer to it, and also by introducing the non-linear function ReLU. Other common procedures will be explained here, as training in batches, training a bigger model so we can get our feets wet with the Neural net backward pass, and a more pythonic writing for our models, as now everything will be inside a class.

We reach nearly 50% accuracy with this model, here we'll visualize again the neurons, see what got better, and what can't get better with our current approach. At the end of this notebook, we'll be finally getting into the modern ConvNets!

We learn how to make some optimized operations for derivatives:

image

We visualize our new graph for a two layer neural net:

image

And we take a look at what a 1000 neurons looks like after training and try to see some templates that were found in cifar-10:

image


4. Airi

In this notebook we implement every layer we learned in a modular way, and also introduce the convolutions. Everything is inside the airi folder in this repo.

A linear layer implemented:

image

We also take a look on different optimizer on this notebooks, and see why SGD is often not used, but instead RMSProp, Adam, or even SGD + Momentum

image


5. Airi ConvNet

In this notebook we visualize the weights of our implemented Convolutional Neural Network, written with our hand designed machine learning library (airi). Its training script is inside airi directory.

image

5x5 learned filters

image

feature maps

image

In this notebook we also take a better look inside of a much bigger convnet reading Visualizing and Understanding Convolutional Networks.


6. Pytorch ConvNet

In this notebook we build the very same convnet we built with Airi but with PyTorch, and learn how simple it its to write and train models in PyTorch. Also, we extend that to a larger CNN to achieve 80% accuracy on cifar-10 validation set. Also, in this notebook we make use of Grad-Cam, to visualize what our model is looking to make its prediction:

image


7. Deep ConvNet (VGG)

In this notebook we build VGG16 architecture using PyTorch, an old, but very powerful classification model that relies on operations we built at Airi (Conv2d, MaxPool, Relu, Linear, , Flatten, Softmax) and dropout.

image


8. Pytorch Image Classification

In this last notebook, we'll take a dataset that consist of 15 cat breeds and classify them with an EfficientNet-B0, provided in the torchvision models package. This notebook is here as a general template for training classification models, here we learn how to split a dataset, create different transforms to the dataset and transform it into a dataloader. Also, we visualize now at big images with Grad-Cam what our model is looking to make its prediciton, see below at the last layer of efficientnet-b0 what its activating the Siamese neuron, and also take a look on its second prediciton (Maine Coon) and what activates that neuron:

Siamese:

image

Maine Coon:

image