bethatkinson/rpart

Error in prune.rpart: subscript out of bounds

kurpav00 opened this issue · 1 comments

Hello,

When trying to build a rpart model with my data, the prune.rpart function sometimes triggers the "subscript out of bounds" error. This problem has been already mentioned here: #4. I, unlike the author of the previous issue, can provide a reproducible example including the data, see https://pastebin.com/d9jfCnNe. The prune function does not always fail (you have to try several times, that's why the for-loop in the code), but when it does, it says:

Error in `[<-`(`*tmp*`, max(keep), 1L, value = cp) : 
  subscript out of bounds

Thank you in advance for any help.

I was able to reproduce this bug. It happens when cptable[,1] contains identical values (at least as big as the pruning cp). In

rpart/R/prune.rpart.R

Lines 9 to 12 in 3980685

temp <- pmax(tree$cptable[, 1L], cp)
keep <- match(unique(temp), temp)
newx$cptable <- tree$cptable[keep, , drop = FALSE]
newx$cptable[max(keep), 1L] <- cp

The match returns only the first index of each unique(temp) in temp. cptable[keep,... then, not only deletes rows with cp values below cp, but some internal rows so that max(keep)>length(keep). I'm submitting at PR #29 that implements the fix noted in issue #4 and that fixed the instance of the bug mentioned here.