nredell/shapFlex

Clarify and add sampling methods

nredell opened this issue · 3 comments

The sampling method(s) in the package need to be more clearly spelled out. There are a of couple related methods in the literature that I'd like to incorporate. Namely, there should be a clear trade-off that the user can make between sampling instances vs. features. Right now, the stochastic-ness in the algorithm is to sample a random instance and shuffle its features in one go...but there might be benefit to sampling one instance and shuffling its features multiple times. Seems like both approaches would converge in the limit but the whole point of the Monte Carlo approach is that we're nowhere near "the limit". Also, the impact of feature dependence needs to be worked out. I've done some reading here but I'm not confident about what the best approach is.

Added argument shapFlex(shuffle = ...) which supports the explore, exploit trade off. Need to run some simulations to look at parameter recovery along this scale.

This is a great paper about asymmetric Shapley values and causality (https://arxiv.org/pdf/1910.06358.pdf). The implementation is fairly straightforward; though, the API needs some thought when having the user specify causal constraints. lavaan and r-causal are possible approaches, but I'm not a huge fan of specifying constraints in one long string. I need to look more into their and other implementations. In any case, this is next on the implementation list because it's an infinitely useful iml method.

We're going to go ahead and close this out. This package has gone full "causal".