hipe-eval/HIPE-2022-data

Annotation Issue (coarse.meto-fine.comp)

EmanuelaBoros opened this issue · 1 comments

Hello!

I've noticed a possible missing entity type in COARSE-METO in HIPE-2022-v2.1-hipe2020-train-fr.tsv, where M. Théodore Reinach should (possibly) be a pers.ind (line 2,141-2,150):

M	O	O	O	O	B-comp.title	O	_	_	NoSpaceAfter
.	O	O	O	O	I-comp.title	O	_	_	_
Théodore	O	O	O	O	B-comp.name	O	_	_	_
Reinach	O	O	O	O	I-comp.name	O	_	_	NoSpaceAfter
,	O	O	O	O	O	O	_	_	_
député	O	O	O	O	B-comp.function	O	_	_	_
radical	O	O	O	O	I-comp.function	O	_	_	_
de	O	O	O	O	I-comp.function	O	_	_	_
la	O	O	O	O	I-comp.function	O	_	_	EndOfLine
Savoie	B-loc	O	B-loc.adm.reg	O	I-comp.function	O	Q12745	_	NoSpaceAfter

Due to several evaluation processes on my side, I'll be checking more in depth other annotated files also, and open an issue for each (if any).

e-maud commented

Many thanks for spotting.

After some investigations, here are some (strange) elements, for information and memo.

  • The mention is in document EXP-1908-01-21-a-i0053 line 2130 (in HIPE-2022-data file and in CLEF-HIPE-2020-internal file).

  • In INCEpTION, the mention appears correctly annotated:

  • In the exported annotations in EXP-1908-01-21-a-i0053.xmi (CLEF-HIPE-2020-internal), the annotation is not there. See lines 1075 and after (permalink), where only comp.title|comp.name|comp.functionare exported:
    <custom:ImpressoNamedEntity xmi:id="12992" sofa="1" begin="2104" end="2106" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.title"/>
    <custom:ImpressoNamedEntity xmi:id="13005" sofa="1" begin="2107" end="2123" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.name"/>
    <custom:ImpressoNamedEntity xmi:id="13018" sofa="1" begin="2125" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.function"/>
    <custom:ImpressoNamedEntity xmi:id="13031" sofa="1" begin="2146" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="loc.adm.reg" wikidata_id="http://www.wikidata.org/entity/Q12745"/>

We will correct this in the current data and log the change, but since it seems to come from the xmi2IOB step, I wonder if other (complex) entities are also lost in the way, and whether it is worth checking the export code.

@mromanello What do you think?