This project aims to predict through Bernoulli Naive Bayes algorithm the essentiality of a gene for some bacterium's life.
-
Download the repository
-
Run main.py
-
The script will output S.Mikatae dataset classification results and plot the ROC curve of the prediction about S.Cerevisiae dataset.
The datasets have been read with Pandas and then converted in Numpy arrays. After they have been discretized. Some features in S.Mikatae contained unknown values, that have been changed to '0'.
10-fold cross validation has been used to evaluate the classifier's accuracy.
Both classification and ROC curve computation seem to be consistent with the compared reference.
Software | Version | Required |
---|---|---|
Python | >= 3 | Yes |
Numpy (Python Package) | Tested on v1.13.3 | Yes |
Scikit-learn (Python Package) | Tested on v0.19.1 | Yes |
Pandas (Python Package) | Tested on v0.21.1 | Yes |
Matplotlib | Tested on v2.1.1 | Yes |