/vhsven-sklearn

A template for scikit-learn extensions

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

vhsven-sklearn - My scikit-learn extensions

This python 3.6 package contains an extension class for scikit-learn's decision tree algorithms. Where scikit-learn by default uses early stopping criteria to prevent a tree from growing too large, this package contains a PruneableDecisionTreeClassifier which uses one of two well-known pruning techniques to keep the size in check. The two techniques are Reduced Error Pruning and Error Based Pruning.

Created based on project-template, a template project for scikit-learn compatible extensions. See also: http://contrib.scikit-learn.org/project-template/.

Installation and Usage

The package by itself comes with a single module and a classifier. Before installing the module you will need numpy and scipy. To install the module execute:

python setup.py install

or

pip install pruneabletree

or (when using Anaconda)

conda develop /path/to/vhsven-sklearn

If the installation is successful, and scikit-learn is correctly installed, you should be able to execute the following in Python:

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import cross_val_score
>>> from pruneabletree import PruneableDecisionTreeClassifier
>>> clf = PruneableDecisionTreeClassifier(random_state=0, prune='rep')
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)

Developers will need to install additional packages to contribute to the code. When using Anaconda, create a new environment based on the environment_dev.yml file. Else, take a look at the Makefile to see which packages were originally installed.

Documentation

The documentation is built using sphinx. It incorporates narrative documentation from the doc/ directory, standalone examples from the examples/ directory, and API reference compiled from estimator docstrings.

The online documentation can be found here.

To build the documentation locally, ensure that you have sphinx, sphinx-gallery and matplotlib by executing:

pip install sphinx matplotlib sphinx-gallery

You can also add code examples in the examples folder. All files inside the folder of the form plot_*.py will be executed and their generated plots will be available for viewing in the /auto_examples URL.

To build the documentation locally execute

cd doc
make html

The project uses CircleCI to build its documentation from the master branch and host it using GitHub Pages.