Supported classifiers:
- ID3 decision tree
- C4.5 decision tree
- Naive Bayes classifier with Laplace smoothing*
- Naive Bayes classifier without Laplace smoothing*
* Laplace smoothing == Laplacian correction == additive smoothing
Compilation:
javac HWThree.java
Usage:
java HWThree [-option] [training_data] [test_data] [index_of_class_label] [-l]
Where 0 <= 'index_of_class_label' < total number of attributes
Default behavior is to run the dataset from the textbook using the C4.5 decision tree
Options:
-b use Bayes ... -l use Laplace smoothing
-c use C4.5
-i use ID3
Example usages:
java HWThree -b mushroom.training mushroom.test 0 -l
java HWThree -b mushroom.training mushroom.test 0
java HWThree -c mushroom.training mushroom.test 0
java HWThree -i mushroom.training mushroom.test 0
Note: training and test data must be in the same directory as the program files!
mushroom.training and mushroom.test data are compiled from their original source located at the UCI Machine Learning Repository, with attribute 11 removed because of missing values. Class labels (poisonous "p" or edible "e") are the first attribute (index = 0).
textbook.txt is data from the class textbook, page 338 of Data Mining: Concepts and Techniques, Third Edition, by Han, Kamber, and Pei.
test.txt contains one line that represents a tuple that needs to be classified. This tuple was personally created, with some influence from the test tuples from the textbook.
- The decision tree algorithm does not currently implement pruning, meaning that outliers and/or noisy data will sometimes result in overfitting.