Possible major bug in evaluation
NotNANtoN opened this issue · 4 comments
Hi @Spijkervet !
I played around with the pretrained Magnatagatune weights and discovered something. The predicted values were relatively low and I was wondering what's up.
Now I discovered, that in the evaluation.py in line 29 there is output = torch.nn.functional.softmax(output, dim=1)
. But, as far as I understand, the Magnatagatune (and the MSD dataset) are multi-label tasks - their loss functions are also the binary cross entropy in code. Hence, I suppose that instead of the softmax there should be a torch.sigmoid
being used there.
Please let me know if I'm wrong, but as I see it now this could change the results of CLMR quite significantly (for the better), given that this code was used to generate the results.
Hi @NotNANtoN !
Thank you for your accurate observation. At some point, I changed the BCELoss to BCEWithLogitsLoss (removing an nn.Sigmoid() call) for the training scheme, but missed adding a sigmoid operation during evaluation. I will report back the results to you, thank you!
This changes results for a linear classifier from the reported:
ROC-AUC_tag = 88.49
PR-AUC_tag = 35.37
To:
ROC-AUC_tag = 88.73
PR-AUC_tag = 35.58
And it changes results for a 2-layer multi-layer perceptron from the reported:
ROC-AUC_tag = 89.3
PR-AUC_tag = 35.9
To:
ROC-AUC_tag = 89.3
PR-AUC_tag = 36.0
(the values obtained in recent commits are a bit lower because they indeed uses .softmax instead of .sigmoid for multi-label datasets)
Thanks again for your accurate observation! I hope our work has been useful to you :)