shap/shap

Questions: question about SamplingExplainer

Opened this issue · 1 comments

Problem Description

Hi, everyone! I check the code of function sampling_estimate.
Assume we have a data instance x with M features.

  • We keep the 1st to j-th feature as original, replace all other features from background dataset, then we get evals_on. --- with jth feature
  • And keep the 1st to (j-1)-th feature as original, replace all others, then we have evals_off. -- without jth feature.

If the features are independent, why we don't use the original x as evals_on, and keep all features other than jth feature as evals_off, only replace jth feature from the background dataset?

I'm not sure the purpose of the current way. Thanks in advance!

    def sampling_estimate(self, j, f, x, X, nsamples=10):
        X_masked = self.X_masked[:nsamples * 2,:]
        inds = np.arange(X.shape[1])

        for i in range(nsamples):
            np.random.shuffle(inds)
            pos = np.where(inds == j)[0][0]
            rind = np.random.randint(X.shape[0])
            X_masked[i, :] = x
            X_masked[i, inds[pos+1:]] = X[rind, inds[pos+1:]]
            X_masked[-(i+1), :] = x
            X_masked[-(i+1), inds[pos:]] = X[rind, inds[pos:]]

        evals = f(X_masked)
        evals_on = evals[:nsamples]
        evals_off = evals[nsamples:][::-1]
        d = evals_on - evals_off

        return np.mean(d, 0), np.var(d, 0)

Alternative Solutions

    def sampling_estimate(self, j, f, x, X, nsamples=10):
        X_masked = self.X_masked[:nsamples * 2,:]
        inds = np.arange(X.shape[1])

        for i in range(nsamples):
            np.random.shuffle(inds)
            pos = np.where(inds == j)[0][0]
            rind = np.random.randint(X.shape[0])
            X_masked[i, :] = x
            X_masked[-(i+1), :] = x
            X_masked[-(i+1), inds[pos]] = X[rind, inds[pos]]

        evals = f(X_masked)
        evals_on = evals[:nsamples]
        evals_off = evals[nsamples:][::-1]
        d = evals_on - evals_off

        return np.mean(d, 0), np.var(d, 0)

Additional Context

No response

Feature request checklist

  • I have checked the issue tracker for duplicate issues.
  • I'd be interested in making a PR to implement this feature

Maybe another way, keep jth feature as original, sample all others from background dataset, use this as the evals_on; and sample all features include jth feature, use this as the evals_off?

Where can I find some theoretical description?

    def sampling_estimate(self, j, f, x, X, nsamples=10):
        X_masked = self.X_masked[:nsamples * 2,:]
        inds = np.arange(X.shape[1])

        for i in range(nsamples):
            np.random.shuffle(inds)
            pos = np.where(inds == j)[0][0]
            rind = np.random.randint(X.shape[0])
            X_masked[i, :] = x
            X_masked[i, inds[pos+1:]] = X[rind, inds[pos+1:]]
            X_masked[i, inds[:pos]] = X[rind, inds[:pos]]
            X_masked[-(i+1), :] = x
            X_masked[-(i+1), inds[:]] = X[rind, inds[:]]

        evals = f(X_masked)
        evals_on = evals[:nsamples]
        evals_off = evals[nsamples:][::-1]
        d = evals_on - evals_off

        return np.mean(d, 0), np.var(d, 0)