Naive Bayes Classifier

This project implements the Naive Bayes classifier from scratch, in order to predict the diagnosis of diabetes in the diabetes dataset.

This project attempts to implement the Naive Bayes Classifier from scratch (with minimal to no use of pre-written libraries) as an excercise. The data is shuffled and split manually. All training & testing steps were done manually in order to deeply understand how probabilistic classifiers work.

Dataset

The diabetes data set consists of 768 data points, each having 9 features:

Pregnancies
Glucose
Blood Pressure
Skin Thickness
Insulin
BMI
Diabetes Pedigree Function
Age
Outcome (Label)

The original source of the data is UCI Machine Learning Repository. Download it from here.

Libraries

Numpy was used in order to perform vector maths.
Pandas was used to store & traverse tabulated data.
SKLearn was used to compare the results of this implementation of Naive Bayes with the SKLearn reference implementation.

Implementation

See notebook.

Conclusion

This project successfully implements the Naive Bayes classifier based on the theory, producing results that match the reference implementation of the SKLearn library.

ANFALATAWI/NaiveBayesClassifier

Naive Bayes Classifier

Dataset

Libraries

Implementation

Conclusion