fastshap handling nominal variables
Npaffen opened this issue · 1 comments
How does fastshap handle nominal variables?
`library(tidymodels)
library(tidyverse)
library(mlbench)
library(xgboost)
library(lightgbm)
library(treesnip)
data(Glass)
head(Glass)
Glass$Type
rec <-recipe(RI ~., data = Glass) %>% step_scale(all_numeric())
prep_rec <- prep(rec, retain = TRUE)
split <- initial_split(Glass)
train_data <- training(split)
test_data <- testing(split)
model<-
parsnip::boost_tree(
mode = "regression"
) %>%
set_engine('lightgbm' , verbose = 0 )
wf_glass <- workflow() %>%
add_recipe(rec) %>%
add_model(model)
fit <- wf_glass %>% parsnip::fit(data = train_data)
library(fastshap)
explain(object = fit %>% extract_fit_parsnip(), newdata = test_data %>% select(-RI) %>% as.matrix(), X = train_data %>% select(-RI) %>% as.matrix(), pred_wrapper = predict)`
This will lead to the following error : Error in genFrankensteinMatrices(X, W, O, feature = column) : Not compatible with requested type: [type=character; target=double].
My guess :
This might be due to the fact that transforming a data.frame with different vector types with as.matrix() will lead to a character matrix. This matrix can't be transformed to a matrix of type double without loosing the values of the factor columns here Type. On the other hand, as the error expresses, we can't use a numeric target for the regression task if all other variables are of class character.
Am I missing something or is this a possible transforming problem?
Is there an option to specify nominal/factor variables?
Hi @Npaffen , yes, I believe you are correct. If you want to use a matrix, everything needs to be encoded numerically (e.g., like in an XGBoost model). If you have factors, you'll need to use a data frame.