songlab-cal/gpn

Calculation of AUPRC in GPN-MSA Figure 2b

yangzhao1230 opened this issue · 4 comments

I'm attempting to reproduce the results shown in Figure 2b, but the AUPRC values I'm calculating seem odd. I've been using the scores provided by your Hugging Face implementation. Could you provide a simple code snippet for replicating the results in Figure 2b?

I have demonstrated my calculation process in a self-contained Colab notebook, which you can access here: Colab Notebook Link. Could you please take a look and let me know if there's anything I'm missing?

Hello! I believe you just need to flip the sign of the scores. Lower means more deleterious, so scores are anti-correlated with label. Apologies that this is not documented.

BTW, songlab/cosmic is for the upcoming v2 of the manuscript, with slight differences from v1.

Whether or not the labels are flipped, the results are very strange.
image
image

Functions I used to calculate the results are from sklearn
from sklearn.metrics import precision_recall_curve, auc, average_precision_score

See edited notebook: link

I flipped the scores instead of the labels.