allenai/scibert

BC5CDR Performance - Representation

Opened this issue · 0 comments

Hello!

I am currently working on a NER research project that uses AllenNLP as backend, and one of the datasets we're using to evaluate or model is BC5CDR. We've been previously using ELMo embeddings, and we wish to switch to SciBERT.

However, after browsing data/ner/bc5cdr, I realized that the data does not differentiate between chemicals and diseases. That is, the fourth column describing the entity label does not contain any information pertaining to which of the two it is. Having said this, I would like to know if the BC5CDR 88.94 Test F1 reported here and in the Scibert paper was obtained by treating chemicals and diseases as the same entity.

Thank you!