Spijkervet/CLMR

Possible major bug in evaluation

NotNANtoN opened this issue · 4 comments

Hi @Spijkervet !

I played around with the pretrained Magnatagatune weights and discovered something. The predicted values were relatively low and I was wondering what's up.

Now I discovered, that in the evaluation.py in line 29 there is output = torch.nn.functional.softmax(output, dim=1). But, as far as I understand, the Magnatagatune (and the MSD dataset) are multi-label tasks - their loss functions are also the binary cross entropy in code. Hence, I suppose that instead of the softmax there should be a torch.sigmoid being used there.

Please let me know if I'm wrong, but as I see it now this could change the results of CLMR quite significantly (for the better), given that this code was used to generate the results.

Hi @NotNANtoN !

Thank you for your accurate observation. At some point, I changed the BCELoss to BCEWithLogitsLoss (removing an nn.Sigmoid() call) for the training scheme, but missed adding a sigmoid operation during evaluation. I will report back the results to you, thank you!

This changes results for a linear classifier from the reported:
ROC-AUC_tag = 88.49
PR-AUC_tag = 35.37

To:
ROC-AUC_tag = 88.73
PR-AUC_tag = 35.58

And it changes results for a 2-layer multi-layer perceptron from the reported:
ROC-AUC_tag = 89.3
PR-AUC_tag = 35.9

To:
ROC-AUC_tag = 89.3
PR-AUC_tag = 36.0

(the values obtained in recent commits are a bit lower because they indeed uses .softmax instead of .sigmoid for multi-label datasets)

Thanks again for your accurate observation! I hope our work has been useful to you :)