SVM with Stochastic Gradient Descent

This repository contains a project that was written as part of the Machine Learning course at University of Toronto.

This is an implementation of Stochastic Gradient Descent with momentum β and learning rate α. The implemented algorithm is then used to approximately optimize the SVM objective.

The algoritham is tested on the MNIST dataset. MNIST is a digit classification dataset consisting of 28 × 28 grayscale images of hand-drawn digits labelled from 0 to 9. In particular, this implementation solves a binary classification problem by trying to classify 4 vs. 9 (the hardest 1-vs-1 pair) and discards the rest of the classes.

The dataset is split into 80% train and 20% test, and the images are converted to vectors by flattening them to a vector of size 784.

For training and evaluating the SVM classifier, run SVM_with_SGD.

Descrption of code implementation:

In this inplementation, two SVM models were trained using gradient descent with a learning rate of α = 0.05, a penalty of C = 1.0, minibatch sizes of m = 100, and T = 500 total iterations.
The stochastic gradient decent with momentum β and learning rate α is given as follows:

The SVM objective is given as:

where the w are the SVM model parameters, b is the bias term, C is the penalty parameter for misclassifying the classes, and N is the batch size. The first term in the objective is the regularization term where the second one is known as the hinge loss.

The gradient of the hinge loss was calculated using sub-gradients:

Note that, the gradient at the "kink" of the hinge loss was taken to be zero. The overall gradient of the soft-SVM classifier was calculated as follows:

MarRist/SVM-with-Stochastic-Gradient-Descent

SVM with Stochastic Gradient Descent

Descrption of code implementation: