Are the count of phrases different from that stated in your paper on ShARe/CLEFE?
sy-wada opened this issue · 0 comments
sy-wada commented
Hi there,
I found that the number of phrases detected was different from the paper when I used conlleval.py to assess NER predictions on ShARe/CLEFE.
for example:
from conlleval import evaluate, metrics, report_notprint
# Test.tsv#L112-L118 in ShARe/CLEFE
text = """\
The O O
left B B
atrium I I
is O O
moderately O O
dilated I O
. O O
"""
seq = text.split('\n')
count = evaluate(seq)
print(metrics(count)[0])
print(''.join(report_notprint(count)))
The output is:
Metrics(tp=1, fp=0, fn=1, prec=1.0, rec=0.5, fscore=0.6666666666666666)
processed 7 tokens with 2 phrases; found: 1 phrases; correct: 1.
accuracy: 85.71%; precision: 100.00%; recall: 50.00%; FB1: 66.67
: precision: 100.00%; recall: 50.00%; FB1: 66.67 1
It seems to me that you treats "left atrium dilated" as 1 phrase in the paper, but the output of conlleval.py is different.
Are there any ways to handle this problem well?
Thanks!