ogrisel/pygbm

Optimize score loss computation

Opened this issue · 0 comments

Slightly related to #76

This is the second bullet point from #69 (comment)

When early stopping (or just score monitoring) is done on the training data with the loss, we should just use the raw_predictions array from fit() instead of re-computing it.

Results would be slightly different from the current implementation because we are currently computing the loss on a subset of the training data, not on the whole training data.

A further optimization would be, instead of calling loss_.__call__(), to compute the loss w.r.t each sample in e.g. loss_.update_gradients_and_hessians and use those values to compute the gradients and hessians. Overhead would be minimal this way.