/CompVision

Some of my computer vision notebooks

Primary LanguageJupyter Notebook

CompVision

MNIST_1

Layer-1: Conv2d (5x5 kernels, 32 filters) + Max pooling (2, 2)
Layer-2: Conv2d (5x5,64 filters) + Max pooling (2, 2) + dropout
Layer-3: Linear (1024 inputs, 10 outputs) + RelU

MNIST_2

Layer-1: Conv2d (5x5 kernels, 32 filters) + Max pooling (2, 2)
Layer-2: Conv2d (5x5,64 filters) + Max pooling (2, 2) + dropout
Layer-3: Conv2d (3x3 kernels, 64 filters, padding = 1)
Layer-4: Conv2d (3x3 kernels, 64 filters, padding = 1) + Output of Layer-2 (Residual)
Layer-5: Linear (1024 inputs, 10 outputs) + RelU

MNIST_3

Layer-1: Conv2d (5x5 kernels, 32 filters) + Max pooling (2, 2)
Layer-2: Conv2d (5x5,64 filters) + Max pooling (2, 2) + dropout
Layer-3: Conv2d (3x3 kernels, 64 filters, padding = 1)
Layer-4: Conv2d (3x3 kernels, 64 filters, padding = 1)
Layer-5: Conv2d (3x3 kernels, 64 filters, padding = 1)
Layer-6: Conv2d (3x3 kernels, 64 filters, padding = 1)
Layer-7: Linear (1024 inputs, 10 outputs) + RelU

MNIST_4

Augmentation: Random rotations + normalization
Layer-1: Conv2d (3x3 kernels, 32 filters)
Layer-2: Conv2d (3x3,32 filters) + Max pooling (2, 2) + dropout
Layer-3: Conv2d (3x3 kernels, 64 filters, padding = 1)
Layer-4: Conv2d (3x3 kernels, 64 filters, padding = 1) + Max pooling (2, 2) + dropout
Layer-5: Conv2d (3x3 kernels, 128 filters, padding = 1) + Max pooling(2, 2) + BN
Layer-6: Linear (1152 inputs, 10 outputs) + RelU
Heavy dropouts countered by adding more channels, training on more epochs and pooling later than previous models

Neural Style Transfer

Pytorch tutorial - https://pytorch.org/tutorials/advanced/neural_style_tutorial.html
A few examples in Sherlock_Examples.ipynb (With different weights for style loss and content loss)

MNIST_SpatialTransformer

Paper: https://arxiv.org/abs/1506.02025
MNIST-4 with an intermediate spatial transformer layer

Flipkart - Object detection

Finding objects in a given image using object detection in PyTorch for the contest.