/gaussian-naive-bayes

Final project for Machine Learning course: implementing a Gaussian Naive Bayes algorithm and comparing to sklearn's version of the algorithm

Primary LanguageJupyter NotebookMIT LicenseMIT

gaussian-naive-bayes

Final project for DATA 2060 Fall 2024: Machine Learning: from Theory to Practice. We implement a Gaussian Naive Bayes algorithm and compare our implementation to scikitlearn's version of the algorithm on the UCI wine dataset.

The jupuyter notebook and report outline the conceptual framework of the algorithm and the mathematics underlying it along with implementing and testing the algorithm. An outline of the code is as follows:

  1. Explains the data representation used, the loss and optimizer functions, and conceptual outline of the model.
  2. Implementation of the model.
  3. Discusses the cases used for unit testing and implements the unit tests.
  4. Examines how well the data fit the Guassian Naive Bayes assumptions of independent, normally distributed features.
  5. Tests and compares our model performance with the GaussianNB algorithm from scikitlearn.

There is also a copy of our final project presentation, given on Dec 10, 2024. We later found that all differences between our output and scikitlearn's were due to variance smoothing. We matched exactly when we turned off variance smoothing for GaussianNB.

The packages used include:

  • python=3.12.5
  • matplotlib=3.9.1
  • pandas=2.2.2
  • scikit-learn=1.5.1
  • numpy=2.0.1
  • jupyter
  • pytest
  • quadprog

A copy of the .yml file used for the course is provided for convenience as well.