This repository contains an innovative algorithm that constructs an ensemble using well-known decision tree induction algorithms such as CART, C4.5, QUEST and GUIDE combined with bagging and boosting. Then, this ensemble is converted to a single, interpretable decision tree in a genetic fashion. For a certain number of iterations, random pairs of decision trees are merged together by first converting them to sets of k-dimensional hyperplanes and then calculating the intersection of these two sets (a classic problem from computational geometry). Moreover, in each iteration, an individual is mutated with a certain probabibility. After these iterations, the accuracy on a validation set is measured for each of the decision trees in the population and the one with the highest accuracy (and lowest number of nodes in case of a tie) is returned. Example.py has run code for all implemented algorithms and returns their average predictive performance, computational complexity and model complexity on a number of dataset
An install.sh script is provided that will install all required dependencies
A nicely looking documentation page is available in the doc/ directory. Download the complete directory and open index.html
A wrapper is written around Orange C4.5, sklearn CART, GUIDE and QUEST. The returned object is a Decision Tree, which can be found in decisiontree.py
. Moreover, different methods are available on this decision tree: classify new, unknown samples; visualise the tree; export it to string, JSON and DOT; etc.
A wrapper is written around the well-known state-of-the-art ensemble techniques XGBoost and Random Forests
A new dataset can easily be plugged in into the benchmark. For this, a load_dataset()
function must be written in load_datasets.py
You can contact me at givdwiel.vandewiele at ugent.be for any questions, proposals or if you wish to contribute.
Please refer to my work when you use it. A reference to this github or to the following (yet unpublished) paper:
@unpublished{genesim, title={\textsc{genesim}: genetic extraction of a single, interpretable model}, author={Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke}, year={2016}, note={available at; \url{https://github.com/GillesVandewiele/GENESIM}} }