
Tree-based algorithms with categorical support

Primary LanguagePythonMIT LicenseMIT

[WIP] Trees, averaging trees and boosted trees from scratch

1. Usage

Prepare data. Here there are 3 features: the first 2 are numerical and the last is nominal.

>>> import numpy as np
>>> X = np.array([[  1,   1,   0],
                  [101, 101,   0],
                  [103, 103,   0],
                  [  3,   3,   0],
                  [  5,   5,   0],
                  [107, 107,   0],
                  [109, 109,   0],
                  [  7,   7,   1],
                  [  8,   8,   1]])
>>> y = np.array([0, 1, 1, 0, 0, 1, 1, 2, 2])

Import module

>>> from trees_and_forests import DecisionTreeClassifier

Initialise and fit data

>>> clf = DecisionTreeClassifier()
>>> clf.fit(X,y)


>>> clf.predict(np.array([[1,1,0]]))

2. Would-like-to-do-but-not-sure-when's


  • Decision tree classifier
  • Decision tree regressor
  • Simple bagging
  • Random forest
  • Extremely randomised trees
  • AdaBoost
  • Gradient boosting

Software development

  • Unit tests
  • API design document
  • Tutorial


  • Cythonise/PyTorchify
  • Performance against scikit-learn

3. Related


4. Resources

http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf https://scikit-learn.org