Random Forests
OlexiyPukhov opened this issue · 1 comments
OlexiyPukhov commented
Is there a way to get this to also work with random forest models?
mayer79 commented
Hello. Yes and no.
- "No": shapviz itself does not calculate SHAP values, it just plots them.
- "Yes": You can use the "treeshap" package on https://github.com/ModelOriented/treeshap to calculate SHAP values and then plot them with "shapviz". It includes a wrapper for treeshap results. The problem: Random forests shine when there are many trees and very deep trees. It is computationally extremely demanding to calculate TreeSHAP in such a case.
- "Yes": An alternative would be to use model-agnostic KernelSHAP. We have a quite fresh R implementation of it, see the code below.
# Approach 1: Kernel SHAP
library(ranger)
library(kernelshap)
library(shapviz)
library(ggplot2)
library(ggpubr)
x <- c("carat", "clarity", "color", "cut")
rf <- ranger(reformulate(x, "price"), data = diamonds)
X <- diamonds[seq(1, nrow(diamonds), 50), x]
background_data <- diamonds[seq(1, nrow(diamonds), 500), ]
# 2 minutes to decompose 1000 predictions
system.time(
ks <- kernelshap(rf, X = X, bg_X = background_data)
)
# Visualization
sv <- shapviz(ks)
sv_importance(sv)
deps <- lapply(x, function(v) sv_dependence(sv, v, color_var = "auto"))
ggarrange(plotlist = deps, ncol = 2, nrow = 2)
# Approach 2: TreeSHAP
# devtools::install_github("ModelOriented/treeshap")