/Group16_knnmodel

Primary LanguagePythonMIT LicenseMIT

pkg_pyknnclassifier

Logo

Documentation Status ci-cd codecov

📄 About

Our package, named "pkg_pyknnclassifier," is a comprehensive toolkit for k-Nearest Neighbors (k-NN) modeling and evaluation. It offers a set of functions designed to facilitate various aspects of working with k-NN algorithms, from loading the data, calculating distances to making predictions and assessing model performance. We aim to simplify the process by providing essential functionalities for data manipulation, model evaluation, and scaling.

Our documentaion: (https://pkg-pyknnclassifier.readthedocs.io/en/latest/)

📦 Functions

This package consists of six functions and explained as below:

  • data_loading(str_of_path, target_column): This function loads data from a file path and split into features and target.
  • scaling(df, impute_strategy, scale_method): This function allows user to choose the method of data imputation and scaling, and apply to the data.
  • calculate_distance(obs_1, obs_2, method = "Euclidean"): This function calculates the Euclidean distance between two observations for the KNN model to find the similarity score.
  • find_neighbors(labeled_arraies, unlabeled_array, k): This function finds the indices of the 'k' nearest neighbors in a collection of labeled arrays to a given unlabeled array.
  • predict(train_X, train_y, unlabel_df, pred_method, k): This function predicts the labels of the unlabled observations based on the similarity score calculated from Euclidean distance.
  • evaluate(y_true, y_pred, metric='accuracy'): This function calculates evaluation metrics such as accuracy, precision, recall, and F1 score for a k-NN model based on true labels and predicted labels.

🛠️ Installation

Option 1 (For Users)

The package has been published to PYPI, we could use pip install

  1. Create and activate a virtual environment using conda
$ conda create --name <env_name> pip -y
$ conda activate <env_name>
  1. Install the package using the command below
$ pip install pkg_pyknnclassifier

Option 2 (For Developers)

To sucessfully run the following commands of installation, we would need conda and poetry, guide included in the link (conda, poetry)

  1. Clone this repository
$ git clone git@github.com:UBC-MDS/Group16_knnmodel.git
  1. Direct to the root of this repository
  2. Create a virtual environment in Conda with Python by the following commands at terminal and activate it:
$ conda create --name pyknnclassifier python=3.11 -y
$ conda activate pyknnclassifier
  1. Install this package via poetry, run the following command.
$ poetry install

✅ Testing

To test this package, please run the following command from the root directory of the repository:

$ pytest tests/
  • branch coverage could be viewed with the following command:
$ pytest --cov-branch --cov=pkg_pyknnclassifier

Usage

To successfully use our knn model to predict the target, please first ensure you have followed the instruction of installation, and then run the following line in a python notebook.

from pkg_pyknnclassifier.data_loading import data_loading
from pkg_pyknnclassifier.scaling import scaling
from pkg_pyknnclassifier.predict import predict
from pkg_pyknnclassifier.evaluate import evaluate

features, target = data_loading('data/toy_dataset.csv', 'Target')
X_scaled = scaling(features, 'median', 'StandardScaler')
pred = predict(X_scaled, target, X_scaled, 'hard', 3)
accuracy_result = evaluate(target, pred, metric='accuracy')
print("Accuracy:", accuracy_result)

📚 Package Integration within the Python Ecosystem

pkg_pyknnclassifier, while acknowledging the robustness and the capabilities of scikit-learn's KNeighborsClassifier, aims to offer a specialized and streamlined toolkit tailored explicitly for k-Nearest Neighbors classification tasks. As a lightweight and focused alternative, pkg_pyknnclassifier serves users who seek a concise package that offers calculating distances, making predictions, and evaluating k-NN models functions. While scikit-learn covers a broader spectrum of machine learning algorithms, pkg_pyknnclassifier provides a more specialized package, potentially appealing to those who prefer a tailored implementation of their k-NN workflows.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

📜 License

pkg_pyknnclassifier was created by "Bill Wan, Sho Inagaki, Shizhe Zhang, Weiran Zhao". It is licensed under the terms of the MIT license.

📚 Credits

pkg_pyknnclassifier was created with cookiecutter and the py-pkgs-cookiecutter template.