bgreenwell/fastshap

Different Scale of SHAP values for Approx vs. ExactSHAP

simonschoe opened this issue · 4 comments

Hi there,

great work with the package first and foremost.

Quick question: Does the ApproxSHAP method scale or standardize SHAP values in any way? When I create global feature attribution rankings for a GBM using Approx as well as TreeSHAP, my SHAP values end up being on substantially different scale. For example, using ApproxSHAP the mean absolute values are in the range of 0.01-0.18 while they lie between 1-14 using TreeSHAP.

Thanks in advance!

Hi @simonschoe, it would be helpful if you could post a reproducible example for me to run on my end. In general, the approximate method used by fastshap depends on the variance of the feature columns. Some problems will require more Monte Carlo reps (say, nsim > 100) to get stable results. It's also useful to set adjust = TRUE in the call to explain(). Let me know if this helps!

Hi @bgreenwell thanks for your reply - sorry for the delay...

Unfortunately, it is difficult for me to provide a reproducible example since the entire workflow is predicated on proprietary data. What I can provide, however, is the following:

shap_values_gbm <- fastshap::explain(
  extract_fit_engine(final_fit_gbm),
  X = X_gbm,
  pred_wrapper = function(object, newdata) predict(object, newdata),
  exact = T,
  newdata = NULL,
  .parallel = T
)

shap_values_gbm2 <- fastshap::explain(
  extract_fit_engine(final_fit_gbm),
  X = X_gbm,
  pred_wrapper = function(object, newdata) predict(object, newdata),
  nsim = 1000, adjust = T,
  newdata = NULL,
  .parallel = T
)

These are the two snippets that run TreeSHAP and ApproxSHAP on my machine, respectively. The resulting top 10 rankings look as follows (the code in between the computation of shap_values_gbm/shap_values_gbm2 and the generation of the plot is identical for both approaches):

TreeSHAP
SHAP_global_feature_attribution_gbm

ApproxSHAP
SHAP_global_feature_attribution_gbm_approx

Hope that this may provide some context as to why/how the difference occurs? Best, Simon

@simonschoe The only thing I can think of is the scale on which the Shapley values are being returned in each approach. For example, in a binary outcome in a GLM, Shapley values could be returned on the link or response scale. The pred_wrapper argument let's you specify this manually, but it may not match internally with what's produced by XGBoost when using the exact (i.e., XGBoost's internal) SHAP procedure.

@bgreenwell But if it is simply a scaling issue shouldnt I at least obtain somewhat similar rank orderings and the same sign of the effect? In the above example, despite running 1k simulations, the two rankings are still so different from each other...