In this project, I'll be building a simple, single layered neural network that classifies handwritten digits. I'll be making use of the MNIST dataset. The images have a size of 28x28 (single channel). The only package I'll be making use of is numpy.
The data preprocessing includes:
- flatten and normalizing the input images (divide by 255)
- shuffle the training set to prevent biases and help training convergence
- one-hot encode the target labels
We will be using a sigmoid activation given by:
For our cost function, we will be using cross-entropy. For a single example the cost will be:
And for a set of m examples:For the layers (exluding the final layer), the forward propagation is given by
By stacking examples, we vectorize the input and get a forward propagation equation of For our final layer (softmax layer), the final activations are the exponentials of its z-valuesThe back propagation is given by:
For the vectorized form with m training examples: Similarly we can calculate the bias term and in vectorized formFor more a more in-depth explanation see these slides.
For training, we implement mini-batch gradient descent, with momentum (beta value of 0.9) and a batch size of 128. We also initialize the weights to 1/n, were n is the number of inputs feeding into that layer. We train for 9 epochs.
After training is done, for each instance, we take the argmax of the final layer and compare to the labelled data and achieve an accuracy of 97.53%.