Error-message: subscript out of bounds
Closed this issue · 4 comments
When I try to fit an XGBoost model on the famous Diabetes dataset, I get the message "Subscript out of bounds". See the code below.
library(tidyverse)
library(Hmisc)
library(xgboost)
library(iBreakDown)
library(tictoc)
library(recipes)
Load dataset
#Diabetes <- read_csv("https://www.kaggle.com/saurabh00007/diabetescsv/diabetes.csv")
Diabetes <- read_csv("diabetes.csv")
Summarise dataset
d <- describe(Diabetes)
plot(d)
Data Pre-processing, bring outliers back to values within certain range
Diabetes_Recept <- recipe(Outcome ~ ., data = Diabetes) %>%
step_range(Pregnancies, min = 0, max = 10) %>%
step_range(Glucose, min = 80, max = 150) %>%
step_range(BloodPressure, min = 50, max = 100) %>%
step_range(SkinThickness, min = 10, max = 50) %>%
step_range(Insulin, min = 10, max = 200) %>%
step_range(Age, min = 20, max = 70) %>%
step_range(BMI, min = 20, max = 55)
Diabetes_prep <- prep(x = Diabetes_Recept,
training = Diabetes)
Diabetes_bake <- bake(object = Diabetes_prep,
new_data = Diabetes)
Prepare for modeling
Y.train <- Diabetes$Outcome
features <- select(Diabetes_bake, -Outcome)
X.train <- features %>% data.matrix()
`
Fit Xgboost Model
tic()
set.seed(12)
param <- list(objective = "binary:logistic", # For classification
eval_metric = "auc", # auc is used for classification
max_depth = 4,
eta = 0.3, # Learning rate
subsample = 0.8,
colsample_bytree = 0.8,
min_child_weight = 2,
scale_pow_weight = sum(Y.train == 0) / sum(Y.train == 1),
max_delta_step = 8)
XGB_Model <- xgboost(data = X.train, label = Y.train, params = param, nround = 100, verbose = F)
toc()
Look at the shap plots
xgb.plot.shap(data = X.train,
model = XGB_Model,
top_n = 8,
n_col = 2,
ylab = "Probability of Diabetes")
Make explain object
predict_logit <- function(model, x) {
raw_x <- predict(model, x)
exp(raw_x)/(1 + exp(raw_x))
}
Explainer_XGB <- DALEX::explain(model = XGB_Model,
label="Extreme Gradient Boosting",
data = X.train,
predict_function = predict_logit,
y = Diabetes$Outcome)
predictions <- predict(XGB_Model, newdata= X.train, type="prob")
case1 <- as.matrix(X.train[1,])
Explain model outcomes on individual case level
After running the next command I get the error message
explain1 <- break_down(x = Explainer_XGB,
new_observation = case1,
interactions = FALSE)
plot(explain1,
max_features = 5,
vcolors = c("green", "red", "purple") )
Thanks for reproducible example.
new_observation
should be a row, not a column
case1 <- t(as.matrix(X.train[1,]))
works for me
@hbaniecki thanks for super quick answer. Can we do anything on our side to correct the error or (maybe even better) return more informative error message?