This repository contains a project that was written as part of the Machine Learning course at University of Toronto.
This is an implementation of Stochastic Gradient Descent with momentum β and learning rate α. The implemented algorithm is then used to approximately optimize the SVM objective.
The algoritham is tested on the MNIST dataset. MNIST is a digit classification dataset consisting of 28 × 28 grayscale images of hand-drawn digits labelled from 0 to 9. In particular, this implementation solves a binary classification problem by trying to classify 4 vs. 9 (the hardest 1-vs-1 pair) and discards the rest of the classes.
The dataset is split into 80% train and 20% test, and the images are converted to vectors by flattening them to a vector of size 784.
For training and evaluating the SVM classifier, run SVM_with_SGD
.
-
In this inplementation, two SVM models were trained using gradient descent with a learning rate of α = 0.05, a penalty of C = 1.0, minibatch sizes of m = 100, and T = 500 total iterations.
-
The stochastic gradient decent with momentum β and learning rate α is given as follows:
- The SVM objective is given as:
where the w are the SVM model parameters, b is the bias term, C is the penalty parameter for misclassifying the classes, and N is the batch size. The first term in the objective is the regularization term where the second one is known as the hinge loss.
- The gradient of the hinge loss was calculated using sub-gradients:
Note that, the gradient at the "kink" of the hinge loss was taken to be zero. The overall gradient of the soft-SVM classifier was calculated as follows: