Error when passing model/data to break_down function
hbaniecki opened this issue · 8 comments
From break_down() examples:
This throws a note:
library("iBreakDown")
library("DALEX")
library("randomForest")
set.seed(1313)
model <- randomForest(status ~ . , data = HR)
new <- HR_test[1,]
explainer_rf <- explain(model,
data = HR[1:1000,1:5],
y = HR$status[1:1000])
Please note that 'y' is a factor.[...]
This works:
break_down(explainer_rf, new)
This throws an error:
break_down(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)
Error in break_down.default(x = model, data = HR[1:1000, 1:5], new_observation = new, :
promise already under evaluation: recursive default argument reference or earlier problems?
This throws an error:
local_attributions(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)
Error in colMeans(yhatpred) : 'x' must be numeric
I guess that the problem is in the predict
function for randomForest.
For classification by default the predict.randomForest returns classes not scores and other functions cannot calculate averages from classes.
DALEX has overloaded yhat
function for prediction, and the DALEX:::yhat.randomForest
recognizes classification forests. In such case it is using pred <- predict(X.model, newdata, type = "prob", ...)
.
So this example shows that it is better to use DALEX adapters, because otherwise user needs to define own predict_fucntion
for some models
This example shows the real problem:
library("iBreakDown")
library("DALEX")
titanic <- na.omit(titanic)
set.seed(1313)
titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,3,6,7,9)]
new <- titanic_small[1,]
model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare + class + sibsp,
data = titanic_small, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_small[,-6],
label = "glm")
break_down(explain_titanic_glm, new)
contribution
glm: intercept 0.298
glm: age = 50 -0.118
glm: gender = male -0.086
glm: class = 2nd -0.046
glm: sibsp = 0 0.005
glm: fare = 13 0.000
glm: prediction 0.053
break_down(model_titanic_glm,
data = titanic_small[,-6],
label = "glm",
new_observation = new)
Error
local_attributions(model_titanic_glm,
data = titanic_small[,-6],
label = "glm",
new_observation = new)
contribution
glm: intercept -1.139
glm: age = 50 -1.043
glm: gender = male -0.613
glm: class = 2nd -0.284
glm: sibsp = 0 0.195
glm: fare = 13 0.010
glm: prediction -2.873
Shouldn't output be the same? Should the warning be added?
The difference between
break_down(explain_titanic_glm, new)
and
local_attributions(model_titanic_glm,
data = titanic_small[,-6],
label = "glm",
new_observation = new)
is in the predict function.
By default DALEX::yhat returns probabilities for given glass
while the default local_attributions
returns the default stats:::predict.glm
which is a link function (i.e. logit(probability)).
May be surprising, but again IMHO its the reason why we shall use DALEX::explainer, otherwise we need to guess that is returned by a specific predict()
. For DALEX::explain by default we will always get probabilities and we can compare probabilities between different models.
It's not clear for me why
break_down(model_titanic_glm,
data = titanic_small[,-6],
label = "glm",
new_observation = new)
is not working. This is tricky.
There could be a warning when not using DALEX::explain()
.
Fixed the error with #44
Right, but then all *.default
functions from auditor, iBreakDown, ingredients need to warn user it the default predict()
is used. Let's discuss this f2f
This was resolved with new DALEX::explain()