subsampling
Closed this issue · 2 comments
Great project!
I've got two questions related to the "weights" argument in rforest
as well to subsampling.
- This code snipped from
rforest
:
data = substitute(data[sample.int(nrow(data), size = subsamp * nrow(data), replace = subsamp < 1)
For subsample = 1
, I'd expect sampling with replacement (like with bootstrap resp. bagging). In the code above, however, replace
is then set to FALSE. Is this as intended? How to set subsample
to get bootstrap sampling?
- How to use the "weights" argument? The
rforest
code is like this:
'weights' = unname(as.list(match.call())[['weights']])
Is the weights vector really shuffled as well?
Thanks for your feedback! Below my answers to your questions:
1: The rforest.R code is updated to always subsample with replacement, so bootstrapping is possible now with subsample = 1
.
2: To use weights there should be a variable in your data
containing the weights, for example named weights_var
. Then you simply specify weights = weights_var
in the function call, much like you would do in the original rpart
function call or most other modelling algos. This will make sure that the correctly sampled values are used during model training.
I hope this answers your questions. I will close this issue for now, but feel free to reopen if you have any other questions related to this!
Wonderful - both issues brilliantly solved. Thanks a lot.