This implementation of the K-Nearest Neighbours algorithm can be broken down into the following steps:
- Load the dataset containing the handwritten digits
- Partition the dataset into a training set and a testing set
- For each image in the testing set:
- Calculate the distance between the test image and all training images.
- Sort the calculated distances in ascending order and keep the K smallest distances.
- Get the most frequent label from the K smallest distances.
- Return the prediction.
The dataset Digit Recognizer contains 40,000 samples of 28x28 images each of which represent a handwritten numerical digit. Note: the dataset used is the "train.csv" which was partitioned into a training subset and a testing subset.
In order to evaluate this implementation of K-Nearest Neighbours an experiment was run using 35,000 training samples, 250 test samples and a K value of 7. This implementation was able to correctly classify the handwritten digit 81.53% of the time.
Download the dataset and copy it into the data/ directory. Rename the file to "dataset.csv" then from the command line run:
python main.py