/NaiveBayesClassifier

Implementation of Naive Bayes classifier using python and tested on a diabetes dataset.

Primary LanguageJupyter Notebook

Naive Bayes Classifier

This project implements the Naive Bayes classifier from scratch, in order to predict the diagnosis of diabetes in the diabetes dataset.

This project attempts to implement the Naive Bayes Classifier from scratch (with minimal to no use of pre-written libraries) as an excercise. The data is shuffled and split manually. All training & testing steps were done manually in order to deeply understand how probabilistic classifiers work.

Dataset

The diabetes data set consists of 768 data points, each having 9 features:

  • Pregnancies
  • Glucose
  • Blood Pressure
  • Skin Thickness
  • Insulin
  • BMI
  • Diabetes Pedigree Function
  • Age
  • Outcome (Label)

The original source of the data is UCI Machine Learning Repository. Download it from here.

Libraries

  • Numpy was used in order to perform vector maths.
  • Pandas was used to store & traverse tabulated data.
  • SKLearn was used to compare the results of this implementation of Naive Bayes with the SKLearn reference implementation.

Implementation

See notebook.

Conclusion

This project successfully implements the Naive Bayes classifier based on the theory, producing results that match the reference implementation of the SKLearn library.