Tutorials of Computer Vision

This repo includes some implementations of Computer Vision algorithms using tf2+. Codes are easy to read and follow. If you can read Chinese, I have a teaching website for studying AI models.

All toy implementations are organised as following:

CNN
- Numpy Convolution mechanism
- LeNet
- VGG
- GoogLeNet
- ResNet
- DenseNet
- SENet
- MobileNetV1
- MobileNetV2
- Xception
- ShuffleNetV1
- ShuffleNetV2

Installation

$ git clone https://github.com/MorvanZhou/Computer-Vision
$ cd Computer-Vision
$ pip install -r requirements.txt

ConvMechanism

Convolution mechanism and feature map

code - gif result

LeNet

Gradient-Based Learning Applied to Document Recognition

code - net structure

VGG

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep stacked CNN.

code - net structure

GoogLeNet

Going Deeper with Convolutions

Multi kernel size to capture different local information

code - net structure

ResNet

Deep Residual Learning for Image Recognition

Add residual connection for better gradients.

code - net structure

DenseNet

Densely Connected Convolutional Networks

Compared with resnet, it has less filter each conv, sees more previous inputs.

code - net structure

SENet

Squeeze-and-Excitation Networks

SE is a module that learns to scale each feature map, it can be plugged in many cnn block, larger reduction_ratio reduce parameter size in FC layers with limited accuracy drop.

code - net structure

MobileNetV1

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Decomposed classical conv to two operations (dw+pw). Small but effective cnn optimized on mobile (cpu).

code - net structure

MobileNetV2

MobileNetV2: Inverted Residuals and Linear Bottlenecks

MobileNet v2 is v1 with residual block and layer rearrange (residual+pw+dw+pw):

mobilenet v1: dw > pw
mobilenet v2: pw > dw > pw let dw see more feature maps

code - net structure

Xception

Xception: Deep Learning with Depthwise Separable Convolutions

Just like MobileNetV2 without last pw (residual+pw+dw).

code - net structure

ShuffleNetV1

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Shuffle the output from 1x1 conv, and do group conv to reduce connections and speed up computing. But MobileNet is better in this case, this may caused by group conv cuts off some feature map communications.

code - net structure

ShuffleNetV2

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Further reduces parameters by switching group conv with split+concat, perform shuffle at end of block. Speed up calculation. But MobileNet is better in this case, this may caused by group conv cuts off some feature map communications.

code - net structure