online-ml/river

Global Split Criteria for Decision Trees

danielnowakassis opened this issue · 0 comments

In #1610, we noted that at each time a split attempt occurs in Decision Trees, an instance of the Split Criterion Object is created.

This is not necessary and a globally instantiated Split Criterion Object would enhance the memory usage of LAST.

In river/tree/nodes/last_nodes.py

# change this in future PR's by acessing the tree parameter in the leaf
self.split_criterion = (
    split_criterion  # if None, the change detector will have binary inputs
)
def learn_one(self, x, y, *, w=1, tree=None):
        self.update_stats(y, w)
        if self.is_active():
            if self.split_criterion is None:
                mc_pred = self.prediction(x)
                detector_input = max(mc_pred, key=mc_pred.get) != y
                self.change_detector.update(detector_input)
            else:
                detector_input = self.split_criterion.current_merit(self.stats)
                self.change_detector.update(detector_input)
            self.update_splitters(x, y, w, tree.nominal_attributes)

would become :

def learn_one(self, x, y, *, w=1, tree=None):
        self.update_stats(y, w)
        if self.is_active():
            if tree.track_error:
                mc_pred = self.prediction(x)
                detector_input = max(mc_pred, key=mc_pred.get) != y
                self.change_detector.update(detector_input)
            else:
                detector_input = tree.current_merit(self.stats)
                self.change_detector.update(detector_input)
            self.update_splitters(x, y, w, tree.nominal_attributes)