Comparing two ML models using p value
Gvinkc opened this issue · 2 comments
Thanks alot @mateuszbuda for the tool. I am trying to compare the two models (for example: random forest, xgboost) using p value.
I have two predicted results for both trained models on test set:
y_pred1 = model1.predict(x_test)
y_pred2 = model2.predict(x_test)
y_true = y_test
from sklearn.metrics import r2_score
import stat_util
y_true = y_test
p, z = stat_util.pvalue(y_true, y_pred1, y_pred2, score_fun=r2_score)
output: 0.0
I am always getting p = 0.0. Would you please correct me if I have not done it correctly. Thank you.
You're doing everything correctly.
Just make sure that the order of cases in y_pred1
, y_pred2
, and y_true
is the same.
Then, you can try to "guess" your p-value by comparing the difference in scores between y_pred1
and y_pred2
and knowing the number of cases in your test set. If you have a lot of cases and the difference between scores is large, you can get p-value=0.0
from bootstrapping, which actually means that it's very small and not exactly 0.0.
In addition, you can plot the distribution of differences from bootstrapped samples that are stored in array z
. Check a notebook with examples: https://github.com/mateuszbuda/ml-stat-util/blob/master/examples.ipynb. Each element in array z
is the difference between scores computed between y_pred1
and y_pred2
based on sampled cases. To verify if it's correct, the mean of values in array z
should equal to the difference between scores. For example, if r2_score(y_true, y_pred1)
is 3.0
and r2_score(y_true, y_pred2)
is 4.0
, then mean(z)
should be 1.0
.
@mateuszbuda, Thank you very much.