An implementation of C4.5 machine learning algorithm in python
C4.5 is an algorithm developed by John Ross Quinlan that creates decision tress. A decision tree is a tool that is used for classification in machine learning, which uses a tree structure where internal nodes represent tests and leaves represent decisions. C4.5 makes use of information theoretic concepts such as entropy to classify the data.
For each dataset there should be two files, one that describes the classes and attributes and one that consists of the actual data. The file for attributes and classes should contain all the classes in first line and after that, line by line the attributes and their possible values if the attribute is discrete. For continuos(numerical) attributes, possible values would be "continuos". Check the iris dataset folder for actual data and more specific syntax.
Create a C4.5 object like this
c1 = C45("path_to_data_file", "path_to_description_file")
After this, you can fetch and preprocess the data, generate the tree and print it to screen.
Navigate to the directory "C4.5" and type python -m unittest discover
to run all the test modules under "C4.5/tests" folder. (the names of the modules should start with "test" and end with ".py")