/pca-naive-bayes-study

A study of the usefulness of PCA for improving NB performance across a variety of datasets

Primary LanguageRGNU General Public License v3.0GPL-3.0

Study of PCA and Naive Bayes

This repo contains code and the contents for an empirical study of the usage of Principle Component Analysis (PCA) for improving the performance of Naive Bayes classifiers.

More details and findings in the paper.

The remainder of this readme will refer to running code to replicate the study.

Data

This study uses several external datasets. However if you want to skip those sources, you can limit the code to only run on datasets included in the R packages by setting BUILTIN_ONLY=TRUE at the top of src/pca_nb.R.

External datasets:

The following files should be obtained and copied to the data/ folder:

Environment

To make sure you have the right environment set up for the code, its recommended that you use renv.

Once renv is installed, you can run renv::restore() from an R session in the project directory to get the specific package versions that I used.

Code

You should run the pca_nb.R file, either by sourcing in Rstudio or from the command line:

Rscript src/pca_nb.R

This will print the results and save a copy of the results table to data/results_table.csv