This repository contains my code and solutions following along with various machine learning courses.
My goal is to learn the basics of machine learning and get a better understanding of the math behind it. My long term goal is to apply this knowledge to Geophysical problems.
My background is in Geophysics and I have a thorough understanding of Python through my hobbies and work.
Following along with this YouTube playlist by Patrick Loeber.
KNN is a simple algorithm that classifies a point based on the majority of its neighbours. The algorithm is as follows:
- Calculate the distance between the point and all other points in the dataset.
- Sort the distances and determine the
k
nearest neighbours based on thek
nearest distances. - Determine the majority of the neighbours and classify the point as that class.
The distance between two points can be calculated using the Euclidean distance formula:
Or for a point in n
dimensions:
An example plot of how this prediction works is shown below. The green point is the point we are trying to classify. The blue and red points are the training data. In this case, k = 3
and the majority of the nearest neighbors are blue, so the point is classified as blue (notably, if it was 0.1 over to the left it would classify as red due to the spike in nearest neighbours).