jpmml/jpmml-evaluator

API for Shapley value estimation

popovstefan opened this issue · 3 comments

I have a project where I would like to use a LightGBM model trained in Python do perform prediction on feature contributions (Shapley values), in the same manner as answered in this StackOverflow question:

Is this possible in the current version of this library?
I have gone through the documentation and various JPPML tutorials and I couldn't figure out a way how to do that. I have successfully trained, converted, and deployed a model in a Java app, but with it I can only predict probabilities (simple model inference).

Is this possible in the current version of this library?

Shapley values are model evaluation-time phenomenon, not model training- or conversion-time phenomenon.

Therefore, the JPMML-LightGBM library needs no changes in this area.

Moving this issue to a more appropriate location.

There is a related project, which performs simple feature impact analysis with various tree ensemble methods (boosting, bagging):
https://github.com/vruusmann/rf_feature_impact

What's the canonical algorithm for estimating Shapley values?

Ideally, the predicted value of the target field could implement some marker interface(s), which would trigger the computation of Shapley values in situ. The Pythonic approach where every prediction aspect (eg. predict, predict_proba, shap) involves running the whole prediction again from scratch seems kind of wasteful.

@vruusmann if there is a pmml (.xml file) with preprocessor + model. Is there a way to use the pmml file to only produce the preprocessed data and not the final prediction? (only apply the transforms - something similar to sklearn-pipeline.transform())

More context- not necessary for you to read - I am trying to use Pmml & shap library together. TreeExplainer in shap library needs the actual sklearn Tree classes. if using pmml i can get preprocessed data - i can pass that to model object in shap library. I was hoping there would be some way to convert pmml back to sklearn Pipeline but probably thats not possible.