ModelOriented/iBreakDown

Error when passing model/data to break_down function

hbaniecki opened this issue · 8 comments

From break_down() examples:
This throws a note:

library("iBreakDown")
library("DALEX")
library("randomForest")
set.seed(1313)

model <- randomForest(status ~ . , data = HR)
new <- HR_test[1,]

explainer_rf <- explain(model,
                        data = HR[1:1000,1:5],
                        y = HR$status[1:1000])

Please note that 'y' is a factor.[...]

This works:

break_down(explainer_rf, new)

This throws an error:

break_down(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)

Error in break_down.default(x = model, data = HR[1:1000, 1:5], new_observation = new, :
promise already under evaluation: recursive default argument reference or earlier problems?

This throws an error:

local_attributions(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)

Error in colMeans(yhatpred) : 'x' must be numeric

I guess that the problem is in the predict function for randomForest.
For classification by default the predict.randomForest returns classes not scores and other functions cannot calculate averages from classes.

DALEX has overloaded yhat function for prediction, and the DALEX:::yhat.randomForest recognizes classification forests. In such case it is using pred <- predict(X.model, newdata, type = "prob", ...).

So this example shows that it is better to use DALEX adapters, because otherwise user needs to define own predict_fucntion for some models

This example shows the real problem:

library("iBreakDown")
library("DALEX")

titanic <- na.omit(titanic)
set.seed(1313)
titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,3,6,7,9)]
new <- titanic_small[1,]

model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare + class + sibsp,
                         data = titanic_small, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_small[,-6],
                               label = "glm")

break_down(explain_titanic_glm, new)

contribution
glm: intercept 0.298
glm: age = 50 -0.118
glm: gender = male -0.086
glm: class = 2nd -0.046
glm: sibsp = 0 0.005
glm: fare = 13 0.000
glm: prediction 0.053

break_down(model_titanic_glm,
           data =  titanic_small[,-6],
           label = "glm",
           new_observation = new)

Error

local_attributions(model_titanic_glm,
                   data =  titanic_small[,-6],
                   label = "glm",
                   new_observation = new)

contribution
glm: intercept -1.139
glm: age = 50 -1.043
glm: gender = male -0.613
glm: class = 2nd -0.284
glm: sibsp = 0 0.195
glm: fare = 13 0.010
glm: prediction -2.873

Shouldn't output be the same? Should the warning be added?

The difference between

break_down(explain_titanic_glm, new)

and

local_attributions(model_titanic_glm,
                   data =  titanic_small[,-6],
                   label = "glm",
                   new_observation = new)

is in the predict function.

By default DALEX::yhat returns probabilities for given glass
while the default local_attributions returns the default stats:::predict.glm which is a link function (i.e. logit(probability)).

May be surprising, but again IMHO its the reason why we shall use DALEX::explainer, otherwise we need to guess that is returned by a specific predict(). For DALEX::explain by default we will always get probabilities and we can compare probabilities between different models.

It's not clear for me why

break_down(model_titanic_glm,
           data =  titanic_small[,-6],
           label = "glm",
           new_observation = new)

is not working. This is tricky.

There could be a warning when not using DALEX::explain().
Fixed the error with #44

Right, but then all *.default functions from auditor, iBreakDown, ingredients need to warn user it the default predict() is used. Let's discuss this f2f

And thanks for fix #44

This was resolved with new DALEX::explain()