Building a neural network from scratch

In this project, I'll be building a simple, single layered neural network that classifies handwritten digits. I'll be making use of the MNIST dataset. The images have a size of 28x28 (single channel). The only package I'll be making use of is numpy.

Preprocessing

The data preprocessing includes:

  • flatten and normalizing the input images (divide by 255)
  • shuffle the training set to prevent biases and help training convergence
  • one-hot encode the target labels

Background

We will be using a sigmoid activation given by:

For our cost function, we will be using cross-entropy. For a single example the cost will be:

And for a set of m examples:

Forward propagation

For the layers (exluding the final layer), the forward propagation is given by

By stacking examples, we vectorize the input and get a forward propagation equation of
For our final layer (softmax layer), the final activations are the exponentials of its z-values

Backwards propagation

The back propagation is given by:

For the vectorized form with m training examples:
Similarly we can calculate the bias term
and in vectorized form

For more a more in-depth explanation see these slides.

Training

For training, we implement mini-batch gradient descent, with momentum (beta value of 0.9) and a batch size of 128. We also initialize the weights to 1/n, were n is the number of inputs feeding into that layer. We train for 9 epochs.

Evaluation

After training is done, for each instance, we take the argmax of the final layer and compare to the labelled data and achieve an accuracy of 97.53%.