ModelOriented/iBreakDown

Error-message: subscript out of bounds

Closed this issue · 4 comments

When I try to fit an XGBoost model on the famous Diabetes dataset, I get the message "Subscript out of bounds". See the code below.

library(tidyverse)
library(Hmisc)
library(xgboost)
library(iBreakDown)
library(tictoc)
library(recipes)

Load dataset

#Diabetes <- read_csv("https://www.kaggle.com/saurabh00007/diabetescsv/diabetes.csv")
Diabetes <- read_csv("diabetes.csv")

Summarise dataset

d <- describe(Diabetes)
plot(d)

Data Pre-processing, bring outliers back to values within certain range

Diabetes_Recept <- recipe(Outcome ~ ., data = Diabetes) %>%
step_range(Pregnancies, min = 0, max = 10) %>%
step_range(Glucose, min = 80, max = 150) %>%
step_range(BloodPressure, min = 50, max = 100) %>%
step_range(SkinThickness, min = 10, max = 50) %>%
step_range(Insulin, min = 10, max = 200) %>%
step_range(Age, min = 20, max = 70) %>%
step_range(BMI, min = 20, max = 55)

Diabetes_prep <- prep(x = Diabetes_Recept,
training = Diabetes)

Diabetes_bake <- bake(object = Diabetes_prep,
new_data = Diabetes)

Prepare for modeling

Y.train <- Diabetes$Outcome
features <- select(Diabetes_bake, -Outcome)
X.train <- features %>% data.matrix()
`

Fit Xgboost Model

tic()
set.seed(12)

param <- list(objective = "binary:logistic", # For classification
eval_metric = "auc", # auc is used for classification
max_depth = 4,
eta = 0.3, # Learning rate
subsample = 0.8,
colsample_bytree = 0.8,
min_child_weight = 2,
scale_pow_weight = sum(Y.train == 0) / sum(Y.train == 1),
max_delta_step = 8)

XGB_Model <- xgboost(data = X.train, label = Y.train, params = param, nround = 100, verbose = F)

toc()

Look at the shap plots

xgb.plot.shap(data = X.train,
model = XGB_Model,
top_n = 8,
n_col = 2,
ylab = "Probability of Diabetes")

Make explain object

predict_logit <- function(model, x) {
raw_x <- predict(model, x)
exp(raw_x)/(1 + exp(raw_x))
}

Explainer_XGB <- DALEX::explain(model = XGB_Model,
label="Extreme Gradient Boosting",
data = X.train,
predict_function = predict_logit,
y = Diabetes$Outcome)

predictions <- predict(XGB_Model, newdata= X.train, type="prob")

case1 <- as.matrix(X.train[1,])

Explain model outcomes on individual case level

After running the next command I get the error message

explain1 <- break_down(x = Explainer_XGB,
new_observation = case1,
interactions = FALSE)

plot(explain1,
max_features = 5,
vcolors = c("green", "red", "purple") )

Thanks for reproducible example.
new_observation should be a row, not a column
case1 <- t(as.matrix(X.train[1,]))
works for me

@hbaniecki thanks for super quick answer. Can we do anything on our side to correct the error or (maybe even better) return more informative error message?