Deep Neural Network from scratch on Image Classification - Dogs or Cats?

Phuong T.M. Chu and Hai Nguyen

Deep Neural Network from scratch on Image Classification - Dogs or Cats? is the project that we finished after the 5th week of studying Machine Learning.

INTRODUCTION

Dogs vs. Cats dataset provided by Microsoft Research contains 25,000 images of dogs and cats with the labels

1 = dog
0 = cat

Project goals:

Building a basic deep neural network from scratch to classify dogs and cats images
Tunning the hyperparameters of the model in order to achieve high accuracy. This project explores the applicability of deep neural network by tunning these hyperparameters:
- Learning rate
- Number of hidden layers
- Number of nodes in each hiden layers
- Number of iterations

BUILDING DEEP NEURAL NETWORK FROM SCRATCH

In this notebook, we implemented all the functions required to build a deep neural network.

Notation:

Superscript $\large $[l]$$ denotes a quantity associated with the $\large $ l^{th}$$ layer.
- Example: $\large $ a^{[L]}$$ is the $\large $ L^{th}$$ layer activation. $\large $ W^{[L]}$$ and $\large $ b^{[L]}$$ are the $\large $ L^{th}$$ layer parameters.
Superscript $\large $(i)$$ denotes a quantity associated with the $\large $ i^{th}$$ example.
- Example: $\large $ x^{(i)}$$ is the $\large $ i^{th}$$ training example.
Lowerscript $\large $ i$$ denotes the $\large $ i^{th}$$ entry of a vector.
- Example: $\large $ a^{[l]}_i$$ denotes the $\large $ i^{th}$$ entry of the $\large $ l^{th}$$ layer's activations).

The initialization for a deeper L-layer neural network is more complicated because there are many more weight matrices and bias vectors. When completing the initialize_params, we made sure that our dimensions match between each layer. Given $\large $ n^{[l]}$$ is the number of units in layer $\large $ l$$ . Thus for example if the size of our input $\large $ X$$ is $\large $(12288, 209)$$ (with $\large $ m=209$$ examples) then:

	Shape of W	Shape of b	Activation	Shape of Activation
Layer 1	$\large $(n^{[1]},12288)$$	$\large $(n^{[1]},1)$$	$\large $ Z^{[1]} = W^{[1]} X + b^{[1]} $$	$\large $(n^{[1]},209)$$
Layer 2	$\large $(n^{[2]}, n^{[1]})$$	$\large $(n^{[2]},1)$$	$\large $ Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$$	$\large $(n^{[2]}, 209)$$
$\large $\vdots$$	$\large $\vdots$$	$\large $\vdots$$	$\large $\vdots$$	$\large $\vdots$$
Layer L-1	$\large $(n^{[L-1]}, n^{[L-2]})$$	$\large $(n^{[L-1]}, 1)$$	$\large $ Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}$$	$\large $(n^{[L-1]}, 209)$$
Layer L	$\large $(n^{[L]}, n^{[L-1]})$$	$\large $(n^{[L]}, 1)$$	$\large $ Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}$$	$\large $(n^{[L]}, 209)$$

Remember that when we compute $\large $ WX + b$$ in python, it carries out broadcasting. For example, if:

$\large $$ W = \begin{bmatrix} j & k & l\\ m & n & o \\ p & q & r \end{bmatrix}\;\;\; X = \begin{bmatrix} a & b & c\\ d & e & f \\ g & h & i \end{bmatrix} \;\;\; b =\begin{bmatrix} s \\ t \\ u \end{bmatrix}$$$

Then $\large $ WX + b$$ will be:

$\large $$ WX + b = \begin{bmatrix} (ja + kd + lg) + s & (jb + ke + lh) + s & (jc + kf + li)+ s\\ (ma + nd + og) + t & (mb + ne + oh) + t & (mc + nf + oi) + t\\ (pa + qd + rg) + u & (pb + qe + rh) + u & (pc + qf + ri)+ u \end{bmatrix}$$$

Mathematical expression of the algorithm:

Foward propagation:

The linear forward module (vectorized over all the examples) computes the following equations:

$\large $$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$$$

where $\large $ A^{[0]} = X^T$$ . And the activation functions:

$\large $$A = RELU(Z) = max(0, Z)$$$

$\large $$A^{[L]} = sigmoid(Z^{[L]})$$$

Cost function

$\large $$J = -\frac1m\sum \bigg( Y \odot log(A^{[L]}) + (1-Y) \odot log(1-A^{[L]}) \bigg)$$$

Note that $\large $\odot$$ denotes elementwise multiplication.

Backward propagation

The three outputs $\large $(dZ^{[l]}, dW^{[l]}, db^{[l]})$$ are computed using the input $\large $ dZ^{[l]}$$ .Here are the formulas we need:

$\large $$dZ^{[l]} = W^{[l+1]^T}dZ^{[l+1]} \odot g^{[l]'}(Z^{[l]})$$$

$\large $$ dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}$$$

$\large $$ db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}$$$

CONCLUSION

We achieved the Accuracy score of 65.2% which is better than the Accuracy score of Logistic Regression Model in sklearn (57.6%). Our hyperparameters are:

Learning rate = 0.001
Number of hidden layers = 5
Number of nodes in each hidden layers = [32,64,128,256,512]
Number of iterations = 2500

Espacio-root/Deep-Neural-Network-from-scratch-on-Image-Classification