kundajelab/deeplift

Selection of Reference value and scores interpretation

Poojavk93 opened this issue · 3 comments

Hey!

While calculating the 'scores', if I use the first row of my test data for input_data_list, as follows:

scores = np.array(deeplift_contribs_func(task_idx=0, input_data_list=[xtest.iloc[[0]]], batch_size=10, progress_update=1000))

Question 1:
Can I use my training data as reference? If so, do I use the complete data?
Also, how do I check for my reference value?

Question 2:
When you say "A negative contribution score on an input means that the input contributed to moving the output below its reference value" what does that indicate? Is the presence of said feature pushing it away from the intended prediction? does the specific feature have a negative-direction impact on the target variable?

Thank you!

Hi, sorry for not replying sooner. I know you emailed to say you figured it out, but for others who may be interested in the answers:

“Can I use my training data as reference? If so, do I use the complete data?”

The core DeepLIFT API allows the user to specify one reference per example. Thus, if using the training data as a reference, you would have to select a representative point from the training data to act as the reference/baseline. It is definitely advisable to use multiple references per example for robustness, but this would have to be implemented by calling DeepLIFT using the different choices for the reference and then averaging the resulting contribution scores. I have wrappers to do this for genomic sequence data (where shuffled versions of the input are used as the reference), but have not implemented more general wrappers for other domains. However, DeepSHAP (an extension of deeplift implemented in the SHAP repo) has built-in support for multiple references per example. It is not necessary to use the complete training data to create references as that would take a rather long time to compute (runtime is linear in the number of referenced you use per example).

I am not sure what you mean by “how do I check for my reference value” - you could compute the output of the model given a particular reference input, and that would tell you the reference value of the output, though I am not sure if this is what you were asking.

Regarding question 2: it means that the feature with the negative contribution score is likely moving the output in the negative direction relative to the value that the output would have if that feature had instead been set to its reference (a.k.a. baseline) value. You asked if it means it’s pushing the output away from its intended prediction; that’s not the case because the reference value of the output is different from the “intended prediction”; the reference value of the output is simply the value that the output has when the model is given the reference input. Note that I used the qualifier “likely” (when I said “likely moving the output”) because DeepLIFT is a heuristic method (that’s why it’s faster than perturbation-based approaches), so the contribution scores are only an approximation of what you might conclude if you were to do exhaustive perturbations.

@Poojavk93 so how did you figure out the good reference input ?

Pasting some correspondence from my email thread with Misbah here, in case it is useful to others:


For the reference/baseline, I would say that domain knowledge is your friend. If your examples correspond to different patients, then a good place to start would be to ask the question "what values do my features take on in normal patients?" and then set your reference according to that. It's also possible to compute scores using multiple different references, and then average the results (for example, if you have a "control" group of patients, then different patients from this control group could serve as references).

If you are using a softmax output, then one thing I would recommend is mean-normalizing the contributions as described the section “Adjustments for Softmax Layers” in the deeplift paper (in section 3.6) (also discussed in this github issue: #116)

If you are trying to understand which features cause an example to be classified as “high for class 0”, then yes, it is good if your reference is low for class 0 (i.e. high for class 1); that way, the features that are responsible for making something belong to class 0 will stand out as being different relative to the reference. However, there may be multiple different ways to get the classifier to predict a low value for class 0, and you would want your reference to be somewhat representative of the type of “low for class 0” examples that occur in real data. Otherwise, if your reference is very unlike anything that you would see in real data, then it may not be very practically meaningful to interpret the differences relative to the reference.

If class 1 and class 0 are mutually exclusive, then any features that have a positive score for class 0 can be considered to have a negative score for class 1 and vice versa; thus, it is not strictly necessary to use a different reference to understand class 1 vs class 0, because if you understand the features for class 0, then you can flip the sign to understand the features for class 1.