mateuszbuda/ml-stat-util

P-value larger than 1

lingchm opened this issue · 2 comments

Thank you so much for the nice written code package.
I am using stat_util.pvalue to compare the AUC of two models. However, I obtain a p-value output greater than 1. Is this possible? How should I interpret this? I knowpred1 has AUC 0.71 and pred2 has 0.58.

image

@lingchm This is because by default two-tailed p-value is computed (which is simply one-tailed p-value multiplied by 2).
In this case, you can get p-value higher than 1.

P-value is computed by

  • bootstrapping examples (the same for pred1 and pred2),
  • computing score (ROC AUC) for pred1 and pred2,
  • calculating the difference,
  • evaluating percentile of value 0.0 in the distribution of differences.

Check use case 2 here: https://mateuszbuda.github.io/2019/04/30/stat.html

You can plot the values in z and the mean (np.mean(z)) should be roughly equal to the difference of your original scores for pred1 and pred2, i.e. 0.71-0.58=0.13 in your case.

One-tailed p-value is the probability mass for values in z that are <0. One-tailed p-value cannot be `>1.0'.
Two-tailed p-value is one-tailed p-value times 2.

For you, p-value of 1.49 means that there is no significant difference between AUC of pred1 and pred2. You probably do not have many examples.

I see. Plotting the values in z was helpful. Thank you very much for your explanation.