ModelOriented/shapviz

Random Forests

OlexiyPukhov opened this issue · 1 comments

Is there a way to get this to also work with random forest models?

Hello. Yes and no.

  • "No": shapviz itself does not calculate SHAP values, it just plots them.
  • "Yes": You can use the "treeshap" package on https://github.com/ModelOriented/treeshap to calculate SHAP values and then plot them with "shapviz". It includes a wrapper for treeshap results. The problem: Random forests shine when there are many trees and very deep trees. It is computationally extremely demanding to calculate TreeSHAP in such a case.
  • "Yes": An alternative would be to use model-agnostic KernelSHAP. We have a quite fresh R implementation of it, see the code below.
# Approach 1: Kernel SHAP

library(ranger)
library(kernelshap)
library(shapviz)
library(ggplot2)
library(ggpubr)

x <- c("carat", "clarity", "color", "cut")
rf <- ranger(reformulate(x, "price"), data = diamonds)
X <- diamonds[seq(1, nrow(diamonds), 50), x]
background_data <- diamonds[seq(1, nrow(diamonds), 500), ]

# 2 minutes to decompose 1000 predictions
system.time(
  ks <- kernelshap(rf, X = X, bg_X = background_data)
)

# Visualization
sv <- shapviz(ks)

sv_importance(sv)
deps <- lapply(x, function(v) sv_dependence(sv, v, color_var = "auto"))
ggarrange(plotlist = deps, ncol = 2, nrow = 2)

# Approach 2: TreeSHAP

# devtools::install_github("ModelOriented/treeshap")

image

image