Plot of 'Generalization Error vs. Training Examples' on Naive Bayes and Perceptron predictions
-
Download the repository, the dataset are included in 'Dataset' folder.
-
For each dataset run /dataset_name.py to plot the respective error curve.
From UCI Repository:
- Adult
- Blood Trasfusion
- Breast Cancer
- Cryotherapy
- Fertility
- Ionosphere
- Mammographic Masses
- Mushrooms
- Pima
- Sonar
Dependently on each kind of dataset, some pre-processing operations have been done. The method used for classification is Cross Validation, 4-fold or stratified 3-fold. The function plot_learning_curve() determines cross-validated test scores for different training set sizes, and plots the Naive Bayes and Perceptron curves.
- Numpy
- Pandas
- Sklearn
- Matplotlib