Contains an object-oriented model for k-Nearest Neighbors made from scratch using mathematical theory.
This model has not been optimized and is intended for educational purposes rather than maximum performance.
These instructions will get a copy of the project up and running on your local machine.
Instructions for installing these software are listed in the next section: Installing. These are the software packages needed to run:
- Python 2.7
These Python packages are also needed:
- numpy
- matplotlib
- pandas
- scikit-learn
- sortedcontainers
If your computer does not already have Python 2.7 installed, download it here.
By default, Python should come with pip (a package manager). Use it to install the following dependencies by opening the Terminal/command line and entering the commands as follows, each line as a separate command:
pip install numpy
pip install matplotlib
pip install pandas
pip install scikit-learn
pip install sortedcontainers
For all models, it is assumed that the model receives well-prepared and cleaned input data X and targets T. Any feature engineering should be done prior to creating a model.
- Clone/fork this repository
- Open and run the 'example.py' file
- Create a new Python script
- Save it in the cloned/forked folder
- Use a format similar to the following:
from k_nearest_neighbors import KNN
X = ... # input data
T = ... # target data
plt.scatter(X[:, 0], X[:, 1], s=100, c=Y, alpha=0.5)
plt.show() # Exploratory data analysis
proportion_train = 0.8
Ntrain = int(proportion_train * len(X))
Xtrain, Ttrain = X[:Ntrain], T[:Ntrain]
Xtest, Ttest = X[Ntrain:], T[Ntrain:]
train_scores = []
test_scores = []
ks = (1, 2, 3, 4, 5)
for k in ks:
model = KNN(k)
model.train(Xtrain, Ttrain)
train_score = model.score(Xtrain, Ttrain)
train_scores.append(train_score)
test_score = model.score(Xtest, Ttest)
test_scores.append(test_score)
print 'k:', k
print 'Training Accuracy:', train_score
print 'Test Accuracy:', test_score, '\n'
plt.plot(ks, train_scores, label='Train Scores')
plt.plot(ks, test_scores, label='Test Scores')
plt.legend()
plt.show()
- Python - A programming language used here to create exploratory data graphs
- Numpy - Python library for mathematical and matrix operations
- Matplotlib - Python library for graphing data
- Pandas - Python library for data manipulation
- Scikit-learn - Python library used for its shuffle function
- Sortedcontainers - Python library with a sorted list to keep track of nearest points
- Eric Yates - Github Profile
This project is licensed under the MIT License - see the LICENSE.md file for details.
- LazyProgrammer: For his courses on machine learning