mateuszbuda/ml-stat-util

Comparing two ML models using p value

Gvinkc opened this issue · 2 comments

Thanks alot @mateuszbuda for the tool. I am trying to compare the two models (for example: random forest, xgboost) using p value.
I have two predicted results for both trained models on test set:
y_pred1 = model1.predict(x_test)
y_pred2 = model2.predict(x_test)
y_true = y_test

from sklearn.metrics import r2_score
import stat_util
y_true = y_test
p, z = stat_util.pvalue(y_true, y_pred1, y_pred2, score_fun=r2_score)

output: 0.0
I am always getting p = 0.0. Would you please correct me if I have not done it correctly. Thank you.

You're doing everything correctly.
Just make sure that the order of cases in y_pred1, y_pred2, and y_true is the same.

Then, you can try to "guess" your p-value by comparing the difference in scores between y_pred1 and y_pred2 and knowing the number of cases in your test set. If you have a lot of cases and the difference between scores is large, you can get p-value=0.0 from bootstrapping, which actually means that it's very small and not exactly 0.0.

In addition, you can plot the distribution of differences from bootstrapped samples that are stored in array z. Check a notebook with examples: https://github.com/mateuszbuda/ml-stat-util/blob/master/examples.ipynb. Each element in array z is the difference between scores computed between y_pred1 and y_pred2 based on sampled cases. To verify if it's correct, the mean of values in array z should equal to the difference between scores. For example, if r2_score(y_true, y_pred1) is 3.0 and r2_score(y_true, y_pred2) is 4.0, then mean(z) should be 1.0.

@mateuszbuda, Thank you very much.