xhuang28/NewBioNer

Problem of the f1 score on JNLPBA ?

Eulring opened this issue · 4 comments

The result of local prediction I get on dataset JNLPBA is almost 0.72 which is far from 0.83 in the paper, is there anything wrong ? e.g. the metric select for multi-entities evaluation.

By the way, the results on the rest four datasets are agreed with the paper.

Thanks for the comment! You are probably right. The scores are somewhat too high. In a previous submission of the paper (made by Li Dong before I joined) the score was 73.19 (you may use this score if you are comparing with our work). I revised the model later but it shouldn't make so much difference. Unfortunately I don't have access to the server that stores all the logs now. I only have some detailed scores (F/P/R for JNLPBA):

Model | F | P | R
STM | 72.73 | 70.58 | 75.01
MTM | 73.48 | 70.83 | 76.33
UM-01 | 80.90 | 85.11 | 77.09
UM-11 | 80.85 | 76.56 | 85.65
UM-00 | 83.82 | 83.74 | 83.91

I'll look into it but I can't gurantee to find the correct scores. The local prediction is not the focus of the paper and we are not showing that we beat baselines on it. It's good to know the rest of the scores are correct.

Got it. Thanks for the reply.

Hello @xhuang28 , we are comparing your works in our new research. I wonder if it is convenient for you to provide the f1-scores of local evaluation on five training datasets(BC2GM, BC4CHEM, JNLPBA, NCBI, Linnaeus) in <xx.xx> format of CRF00, CRF01, CRF11, and MTM.

Model | BC2GM | BC4CHEM | NCBI | JNLPBA | Linnaeus
STM | 79.87 | 88.59 | 84.11 | 72.73 | 87.33
MTM | 80.27 | 89.23 | 85.77 | 73.48 | 88.54
UM-01 | 70.94 | 83.47 | 79.81 | 80.90 | 79.94
UM-11 | 74.24 | 84.13 | 80.45 | 80.85 | 80.69
UM-00 | 79.12 | 87.27 | 83.98 | 83.82 | 83.93