Implementation of a decision tree ensemble which splits each node using learned linear and non-linear functions.
Once downloaded, go to the LANDMark directory and type: pip install .
n_estimators: int, default = 64
The number of trees used to construct the ensemble.
min_samples_in_leaf: int, default = 5
The minimum number of samples in each leaf to proceed to cutting.
max_depth: int, default = -1
The maximum depth of the tree. '-1' implies that trees will fully
grow until a stopping criterion is met.
max_features: float, default = 0.80
The maximum features used to train each model at each node. These
features are randomly selected at each node.
min_gain: float, default = 0
The minimum gain needed to split a node.
impurity: str, default = "gain"
The method by which LANDMark calculates the purity of each split.
Currently, only information gain is used.
use_oracle: bool, default = True
Specifies if a random linear oracle should be used as the initial
split.
use_lm_l2: bool, default = True
Specifies if linear models using L2 regularization should be trained.
use_lm_l1: bool, default = True
Specifies if linear models using L1 regularization should be trained.
use_nnet: bool, default = True
Specifies if neural network models should be trained.
nnet_min_samples: int, default = 32
The minimum number of samples needed to train a neural network model.
use_etc: bool, default = True
Specifies if Extremely Randomized Tree models should be trained. Could
be used as an alternative to neural network models.
bootstrap: bool, default = False
Specifies if each tree is built using a bootstrap sample.
n_jobs: int, default = 4
The number of processes used to create the LANDMark model.
X: NumPy array of shape (m, n) where 'm' is the number of samples and 'n'
the number of features (features, taxa, OTUs, ASVs, etc).
y: NumPy array of shape (m,) where 'm' is the number of samples. Each entry
of 'y' should be a factor.
from LANDMark import LANDMarkClassifier
from sklearn.datasets import make_classification
X, y = make_classification()
clf = LANDMarkClassifier().fit(X, y)
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn:
Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–30.
Kuncheva LI, Rodriguez JJ. Classifier ensembles with a random linear oracle.
IEEE Transactions on Knowledge and Data Engineering. 2007;19(4):500–8.
Geurts P, Ernst D, Wehenkel L. Extremely Randomized Trees. Machine Learning. 2006;63(1):3–42.