ck37/varimpact

Plot variable importance

Closed this issue · 5 comments

Hello Chris,

I need to ask you about the interpretation of the number in the plot of variable importance in your example of total cholesterol your presentation.
For example, for the cholesterol level between [0,200] low risk, the number is 0.26. Is 0.26 is percentage form. How do we interpret this number? Is this average impact on the target variable.? The same goes for the impact.

ck37 commented

For the [0,200] cholesterol level, that is saying that if everyone in the sample were to be assigned that level of cholesterol, we estimate that the mean outcome would be 26%. That is called a treatment-specific mean (TSM) in causal inference.

Then the impact is the risk difference between setting everyone in the sample to the high-risk value of a given variable compared to the low-risk value. The more important variables will show the greatest change in the outcome variable.

Does that make sense / any follow-up questions?

Thank you so much for the explanation. It answers my question.

ck37 commented

Excellent

I have a one short question. How the low and high risk are calculated? Are calculated with and without features?

ck37 commented

For each level (e.g. decile) of the variable we calculate the treatment-specific mean, which is the outcome average when we counterfactually set all observations to have that variable equal to a specific level and adjust for all of the other covariates. We then review the estimated treatment-specific mean for each level of each variable and identify 1) the level that yields the highest outcome mean and 2) the level that yields the lowest outcome mean. Once we have identified those two levels we can specify our data-adaptive target parameter as either a difference (subtracting the two TSMs, so null is 0) or a ratio (dividing them, so null is 1).