K-Nearest Neighbours

Algorith Overview

This implementation of the K-Nearest Neighbours algorithm can be broken down into the following steps:

Load the dataset containing the handwritten digits
Partition the dataset into a training set and a testing set
For each image in the testing set:
1. Calculate the distance between the test image and all training images.
2. Sort the calculated distances in ascending order and keep the K smallest distances.
3. Get the most frequent label from the K smallest distances.
4. Return the prediction.

Dataset

The dataset Digit Recognizer contains 40,000 samples of 28x28 images each of which represent a handwritten numerical digit. Note: the dataset used is the "train.csv" which was partitioned into a training subset and a testing subset.

Results

In order to evaluate this implementation of K-Nearest Neighbours an experiment was run using 35,000 training samples, 250 test samples and a K value of 7. This implementation was able to correctly classify the handwritten digit 81.53% of the time.

Execution

Download the dataset and copy it into the data/ directory. Rename the file to "dataset.csv" then from the command line run: python main.py

GrahlmanMatthew/KNN-Digit-Recognition

K-Nearest Neighbours

Algorith Overview

Dataset

Results

Execution

Reference Materials