Permutation importance computed using test data?

Question

Permutation importance computed using test data?

Yu-Liu207 opened this issue 4 years ago · 2 comments

Thank you for creating the helpful package!

I have used the ranger package to train a random forest model (model_rf.train.PHQ2) using a training dataset, and wonder if I can use the vi_permute() function to compute the permutation importance of the features using a separate test dataset (LTM_listwise.test which contains same variables, different observations compared to the training dataset)?

I got the following R script to run, but wonder if this gives me what I wanted to get ().

pred_wrapper <- function(object, newdata) predict(object, newdata)

set.seed(69855)
model_rf.test.PHQ2.vip <- vi_permute(model_rf.train.PHQ2,
train = LTM_listwise.test,
target = LTM_listwise.test$PHQ2,
metric = "mse",
type = "difference",
nsim = 10,
pred_wrapper = pred_wrapper)

Thanks,

Yu

Answer 1 · 2021-01-18T23:22:23.000Z

Hi @Yu-Liu207, thank you and glad you find the package useful! At first glance, it looks like you’re doing it right. The permutation happens in the data set supplied via the (perhaps poorly named) train argument. As long as you supply a test set, it should be computing the scores you’re interested in.

Answer 2 · 2021-01-18T23:42:01.000Z

Great, thanks!