Implementation of c4.5 Data format needed : names file : classes should be mentioned in first line. attributes should be mentioned one per line, with type of data separated with a colon, eg. attrbute : (d1,d2,d3) OR attribute : continuous data file : class should be the first OR last attribute. How to run : > python main.py [data-file-name] [names-file-name] eg. > python main.py "../data/house-votes-84/house-votes-84.data" "../data/house-votes-84/house-votes-84.names" Expected Output : sepal length <= 2.45 : sepal width <= 6.8 : petal width <= 4.15 : petal length <= 7.8 : Iris-setosa petal length > 7.8 : Iris-virginica petal width > 4.15 : Iris-setosa sepal width > 6.8 : Iris-virginica sepal length > 2.45 : Iris-virginica The code creates the tree based on 80% of random rows selected from data. (training) The testing is done on the remaining 20% of the data by a function call on line 16 of main.py K Cross Validation has been implemented, it is called fromline 18 of main.py (at 3:38 AM 19 Feb, Testing only works for discrete data. Figuring out how to test for continuous data.) (UPDATE 4:26 AM 19 Feb, Testing works on continuous data as well.)