bgreenwell/fastshap

fastshap handling nominal variables

Npaffen opened this issue · 1 comments

How does fastshap handle nominal variables?

`library(tidymodels)
library(tidyverse)
library(mlbench)
library(xgboost)
library(lightgbm)
library(treesnip)
data(Glass)

head(Glass)
Glass$Type
rec <-recipe(RI ~., data = Glass) %>% step_scale(all_numeric())

prep_rec <- prep(rec, retain = TRUE)

split <- initial_split(Glass)

train_data <- training(split)

test_data <- testing(split)

model<-
parsnip::boost_tree(
mode = "regression"
) %>%
set_engine('lightgbm' , verbose = 0 )

wf_glass <- workflow() %>%
add_recipe(rec) %>%
add_model(model)
fit <- wf_glass %>% parsnip::fit(data = train_data)

library(fastshap)
explain(object = fit %>% extract_fit_parsnip(), newdata = test_data %>% select(-RI) %>% as.matrix(), X = train_data %>% select(-RI) %>% as.matrix(), pred_wrapper = predict)`

This will lead to the following error : Error in genFrankensteinMatrices(X, W, O, feature = column) : Not compatible with requested type: [type=character; target=double].

My guess :
This might be due to the fact that transforming a data.frame with different vector types with as.matrix() will lead to a character matrix. This matrix can't be transformed to a matrix of type double without loosing the values of the factor columns here Type. On the other hand, as the error expresses, we can't use a numeric target for the regression task if all other variables are of class character.

Am I missing something or is this a possible transforming problem?
Is there an option to specify nominal/factor variables?

Hi @Npaffen , yes, I believe you are correct. If you want to use a matrix, everything needs to be encoded numerically (e.g., like in an XGBoost model). If you have factors, you'll need to use a data frame.