Building a neural network from scratch

In this project, I'll be building a simple, single layered neural network that classifies handwritten digits. I'll be making use of the MNIST dataset. The images have a size of 28x28 (single channel). The only package I'll be making use of is numpy.

Preprocessing

The data preprocessing includes:

flatten and normalizing the input images (divide by 255)
shuffle the training set to prevent biases and help training convergence
one-hot encode the target labels

Background

We will be using a sigmoid activation given by:

For our cost function, we will be using cross-entropy. For a single example the cost will be:

And for a set of m examples:

$L(Y,\hat{Y})={-1/m}\sum^{m}_{i=1}\sum^{n}_{i=1}y_{i}^{(i)}\log(\hat{y}_{i}^{(i)})$

Forward propagation

For the layers (exluding the final layer), the forward propagation is given by

By stacking examples, we vectorize the input and get a forward propagation equation of

For our final layer (softmax layer), the final activations are the exponentials of its z-values

$\hat{y}=\frac{e^{z_{i}}}{\sum^{9}_{i=0}e^{z_j}}$

Backwards propagation

The back propagation is given by:

For the vectorized form with m training examples:

Similarly we can calculate the bias term

$\frac{\partial{L}}{\partial{b}}=(\hat{y}-y)$

and in vectorized form

For more a more in-depth explanation see these slides.

Training

For training, we implement mini-batch gradient descent, with momentum (beta value of 0.9) and a batch size of 128. We also initialize the weights to 1/n, were n is the number of inputs feeding into that layer. We train for 9 epochs.

Evaluation

After training is done, for each instance, we take the argmax of the final layer and compare to the labelled data and achieve an accuracy of 97.53%.

mikkkeldp/nn-from-scratch