ykwon0407/WeightedSHAP

How can we calculate the model global features importance from exp_dict['value_list'] ?

CoteDave opened this issue · 5 comments

Hi,

Can you please add an example on how we can use exp_dict['value_list'] to calculate the global feature importance of a model ?

Thanks !

I was thinking of doing something like that (calculating the mean of all absolute weighted shap values), But i'm not sure if this is it !:
image

Hi @CoteDave, WeightedSHAP is originally designed for the local feature importance of a model prediction, so it does not directly provide the global feature importance. However, we believe the mean of all absolute WeightedSHAP values you implemented in the figure can stand as a good extension of our method. One thing you may want to check is the meaning of its value. The global feature importance is often defined with the expected loss function (Here is a good reference for the global feature importance SAGE), but WeightedSHAP considers a model prediction.

Another straightforward way to obtain the global feature importance in the WeightedSHAP style is to use a different coalition function. In our paper, the marginal contribution (Equation (2)) is defined with the conditional coalition function (Equation (1)). WeightedSHAP version of the global feature importance is achievable if the coalition function is replaced with the expected loss function used in SAGE.

Hi @ykwon0407,

Thanks for the fast answer ! I'll definitely take a look at SAGE and it's expected loss function.

For local feature importance, If you could provide the code on how you made this comparison, it could help to better understand how to practicly use weightedSHAP for local explanation:
image

Finally, in a local feature importance context, i'm not sure how can I interpret this plot:
image

Thanks!

@CoteDave We are going to make more examples including MNIST available soon!

As for the second example, it is very interesting! I've never seen such a pattern before and most of the patterns I've seen are simply decreasing as more features are added. Sharing my experience, the local feature importance is highly dependent on data distribution. So, your result shows one characteristic of your data distribution, for instance, how many true (or noisy) features are in your dataset. According to the figure, only a few features (say less than 12) appear to have an impact on a prediction function.

You may also want to double-check if (1) the surrogate function that generates conditional expectation is sufficiently reliable, (2) enough Monte Carlo samples are used for the marginal contribution estimation, and (3) the absolute distance $|f(x)-\mathbb{E}[f(X) \mid X_S = x_S]|$ presented in the figure does not have a large variance. To address the issue (3), you can repeat the entire estimation procedure multiple times and use their summary values (In our paper, we consider 50 independent runs for 100 test samples). It will minimize the randomness from intermediate estimation procedures (surrogate + marginal contribution). Thank you for sharing your analysis (we need more such cases) and hope you found it helpful.

Thanks! Very helpful! :)