/KNearestNeighbors

K-Nearest Neighbors from scratch with different Distance Metrics and an Accuracy: 96.67%. Distance Metrics [Euclidean, Minkowski, Manhattan, Hamming] to find the optimal accuracy.

Primary LanguagePythonMIT LicenseMIT

KNearestNeighbors

Objective

Optimize two hyperparameters (K-values and Distance Function) for K-Nearest Neighbor Model.

Distance Function: [Euclidean, Minkowski, Manhattan, Hamming]
K-Value: output has 3 classifications, recommend an even K-Value

Model

K-Nearest Neighbors: A non-parametric classification model that calculates the distance of n-test observations from all the observations of the training dataset and output as the class with the highest frequency from the K-most similar instances.

KK Disciplines
Lazy Learning: Training is not required and all of the work happens at the time a prediction is requested.
Instance-Based Learning: Raw training instances are used to make predictions.
Non-Parametric: KNN makes no assumptions about the functional form of the problems being solved.

AVOID

  • Curse of Dimensionality: As the number of dimensions increases the volume of the input space increases at an exponential rate

Repository File Structure

├── src          
│   └── main.py              # Optimize two hyperparameters K-values with variants of Distance Function for K-Nearest Neighbor Model
├── plots
│   └── ErrorRatekValue.png  # Error Rate K-Value
├── requierments.txt         # Packages used for project
└── README.md

Outputs & Distance Functions

  • Euclidean Distance
Euclidean Distance
K-Nearest Neighbor Accuracy: 96.67%
  • Minkowski Distance
Minkowski Distance
K-Nearest Neighbor Accuracy: 96.67%
  • Manhattan Distance
Manhattan Distance
K-Nearest Neighbor Accuracy: 93.33%
Hamming Distance
K-Nearest Neighbor Accuracy: 93.33%

Data

Target Class:
Iris-setosa       float64
Iris-versicolor   float64
Iris-virginica    float64

Features:     
Sepal-width       float64
Sepal-length      float64
Petal-width       float64
Petal-length      float64