Use Case Weights To Threshold Splits
Opened this issue · 0 comments
mhermher commented
If the weights passed into the model are case weights, then should they not be used to determine whether a split should happen or not?
In partition.c me->num_obs is being compared to rp.min_split instead of me->sum_wt.
similarly, in anova.c (haven't looked at the others), right_n and left_n are being compared to edge (rp.min_node) instead of right_wt and left_wt.
Using case weights to represent number of cases is really helpful in managing runtime and memory efficiency, but the split logic in the C code is not considering them.
Even writing as custom split function would solve the latter case, but not the former.