This project explores the possibilities of Shapley Value pooling strategies. Since Shapley Values are additive in nature, it is possible to combine input tokens into less granular phrases and sum up the corresponding Shapley Values to obtain phrase-level explanations. Of course the semantic definition of a phrase is loosely defined as a grouping of words that have some syntactic function in a given context.
We explore different strategies of pooling Shapley Values into:
- sentences
- k-word phrases
- [/] language-syntax-tree-defined phrases
- adaptively-defined phrases
This project uses the Poetry Package Manager and the recommended way to install the project is to:
- Build the package:
poetry build
- Find the wheel in the
dist
folder and install the wheel withpip
:pip install <path-to-wheel>
- Because some methods here rely on SpaCy providing the dependency tree, you will need to download the SpaCy pipeline for the English language:
python -m spacy download en_core_web_sm
Warning
At least Python 3.10 is required for this package to work. We profusely use functional programming concepts such as structural pattern matching and some of these facilities are only available in Python 3.10 and newer.
Warning
If you're developing the project, install with poetry install
instead.
The most naive pipeline is taking all the generated Shapley Values and pooling them together sentence-by-sentence. The pipeline with the provided example file shap.pkl
can be run as follows:
python -m shap_adapool.pooling_strategies.sentence_pooling
TODO
Language Syntax Tree Pooling relies on the traversal of a syntax tree and formation of phrases defined on that tree:
python -m shap_adapool.pooling_strategies.syntax_tree_pooling
Note
For now the module only prints out a syntax tree for a predefined sentence. We will soon implement the pooling strategy fully.
TODO
For development:
- Clone this repo.
- Install the repo using
poetry install
. This will install the package in editable mode.