Handwritten-Digit-Classification

This repository contains implementation of the kNN, naive Bayes and conditional Gaussian classifiers to label images of handwritten digits using the MNIST dataset. Each image is 8 x 8 pixels and is represented as a vector of dimension 64 by listing all the pixel values in raster scan order. The images are grayscale and the pixel values are between 0 and 1.

There are 700 training cases and 400 test cases for each digit; they can be found in a2digits.zip.

data.pyloads data from a given zipfile, directory, and digits pixels from a given test/train set.

RUNNING THE CODE:

For loading and plotting the MNIST dataset, run load_and_plot.py.

For training and evaluating the kNN classifier, run kNN.py.

For training and evaluating the conditional Gaussian classifier, run conditional_gaussian.py.

For training and evaluating the Naive Bayes classifier, run naive_bayes.py.

Description of code implementation:

1) kNN classifier

  • The kNN classifier using Euclidean distance was evaluated for K = 1 and K = 15.

  • For K > 1 the K-NN algorithm may encounter ties. In that case the value of K was decreased by 1 and the K-NN algorithm was re-evaluated using the reduced K value or until K = 1 was reached.

  • An optimal K in the 1-15 range was found using 10-fold cross-validation.

2) Conditional Gaussian classifier

  • A conditional Gaussians with a separate, full covariance matrix was fitted for each class using MLE. The conditional multivariate Gaussian probability density is:

eq0

where eq1

3) Naive Bayes classifier

  • The real-valued features x were converted into binary features b using 0.5 as a threshold: bj = 1 if x_j > 0.5 otherwise b_j = 0.

  • Bernoulli Naive Bayes classifier using MAP estimation with prior Beta(α, β) with α = β = 2 was trained using the binary features b and the class labels. The fitted model is:

    eq1

    eq2

    eq3

    eq4