/SVM-with-Stochastic-Gradient-Descent

This repository contains projects that were written for Machine Learning course at University of Toronto

Primary LanguagePython

SVM with Stochastic Gradient Descent

This repository contains a project that was written as part of the Machine Learning course at University of Toronto.

This is an implementation of Stochastic Gradient Descent with momentum β and learning rate α. The implemented algorithm is then used to approximately optimize the SVM objective.

The algoritham is tested on the MNIST dataset. MNIST is a digit classification dataset consisting of 28 × 28 grayscale images of hand-drawn digits labelled from 0 to 9. In particular, this implementation solves a binary classification problem by trying to classify 4 vs. 9 (the hardest 1-vs-1 pair) and discards the rest of the classes.

The dataset is split into 80% train and 20% test, and the images are converted to vectors by flattening them to a vector of size 784.

For training and evaluating the SVM classifier, run SVM_with_SGD.

Descrption of code implementation:

  • In this inplementation, two SVM models were trained using gradient descent with a learning rate of α = 0.05, a penalty of C = 1.0, minibatch sizes of m = 100, and T = 500 total iterations.

  • The stochastic gradient decent with momentum β and learning rate α is given as follows:

eq0

eq1

  • The SVM objective is given as:

eq2

where the w are the SVM model parameters, b is the bias term, C is the penalty parameter for misclassifying the classes, and N is the batch size. The first term in the objective is the regularization term where the second one is known as the hinge loss.

  • The gradient of the hinge loss was calculated using sub-gradients:

eq3

Note that, the gradient at the "kink" of the hinge loss was taken to be zero. The overall gradient of the soft-SVM classifier was calculated as follows:

eq4