UniversalDependencies/tools

eval.py reports higher than 100 aligned accuracy on enhanced dependencies

AngledLuffa opened this issue · 2 comments

ELAS and EULAS scores are higher than 100:

python3 eval.py UD_English-EWT/en_ewt-ud-train.conllu UD_English-EWT/en_ewt-ud-train.conllu -v
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |    100.00 |    100.00 |    100.00 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |    100.00 |    100.00 |    100.00 |
UPOS       |    100.00 |    100.00 |    100.00 |    100.00
XPOS       |    100.00 |    100.00 |    100.00 |    100.00
UFeats     |    100.00 |    100.00 |    100.00 |    100.00
AllTags    |    100.00 |    100.00 |    100.00 |    100.00
Lemmas     |    100.00 |    100.00 |    100.00 |    100.00
UAS        |    100.00 |    100.00 |    100.00 |    100.00
LAS        |    100.00 |    100.00 |    100.00 |    100.00
ELAS       |    100.00 |    100.00 |    100.00 |    105.02    <---
EULAS      |    100.00 |    100.00 |    100.00 |    105.02   <---
CLAS       |    100.00 |    100.00 |    100.00 |    100.00
MLAS       |    100.00 |    100.00 |    100.00 |    100.00
BLEX       |    100.00 |    100.00 |    100.00 |    100.00

If I had to guess without actually looking at the code, maybe it's getting extra credit for lines where there is more than one enhanced dependency to count?

Also, this happens if I do git checkout 799292f54c699fd2ccf90b0b890a0533ccf35fd4 in order to go earlier than my recent changes, so definitely not my fault :P

My intuition is 100% correct:

count of aligned lines, ignoring multiplicity:

tools/eval.py

Line 506 in 77500d7

aligned = len(alignment.matched_words)

possibility of multiple +1 for a single line:

tools/eval.py

Line 513 in 77500d7

for (sparent,sdep) in system_deps:

I'd fix it, but I don't know what we should make "aligned accuracy" represent in this case, if anything. Perhaps an empty column is the most appropriate?

Thanks for reporting. I agree that aligned accuracy does not make sense here. Fixed.