hipe-eval/HIPE-2022-data

Annotation Issue (coarse.meto-fine.comp)

Opened this issue · 1 comments

Hello!

I've noticed a possible missing entity type in COARSE-METO in HIPE-2022-v2.1-hipe2020-train-fr.tsv, where M. Théodore Reinach should (possibly) be a pers.ind (line 2,141-2,150):

M	O	O	O	O	B-comp.title	O	_	_	NoSpaceAfter
.	O	O	O	O	I-comp.title	O	_	_	_
Théodore	O	O	O	O	B-comp.name	O	_	_	_
Reinach	O	O	O	O	I-comp.name	O	_	_	NoSpaceAfter
,	O	O	O	O	O	O	_	_	_
député	O	O	O	O	B-comp.function	O	_	_	_
radical	O	O	O	O	I-comp.function	O	_	_	_
de	O	O	O	O	I-comp.function	O	_	_	_
la	O	O	O	O	I-comp.function	O	_	_	EndOfLine
Savoie	B-loc	O	B-loc.adm.reg	O	I-comp.function	O	Q12745	_	NoSpaceAfter

Due to several evaluation processes on my side, I'll be checking more in depth other annotated files also, and open an issue for each (if any).

Many thanks for spotting.

After some investigations, here are some (strange) elements, for information and memo.

  • The mention is in document EXP-1908-01-21-a-i0053 line 2130 (in HIPE-2022-data file and in CLEF-HIPE-2020-internal file).

  • In INCEpTION, the mention appears correctly annotated:

  • In the exported annotations in EXP-1908-01-21-a-i0053.xmi (CLEF-HIPE-2020-internal), the annotation is not there. See lines 1075 and after (permalink), where only comp.title|comp.name|comp.functionare exported:
    <custom:ImpressoNamedEntity xmi:id="12992" sofa="1" begin="2104" end="2106" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.title"/>
    <custom:ImpressoNamedEntity xmi:id="13005" sofa="1" begin="2107" end="2123" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.name"/>
    <custom:ImpressoNamedEntity xmi:id="13018" sofa="1" begin="2125" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.function"/>
    <custom:ImpressoNamedEntity xmi:id="13031" sofa="1" begin="2146" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="loc.adm.reg" wikidata_id="http://www.wikidata.org/entity/Q12745"/>

We will correct this in the current data and log the change, but since it seems to come from the xmi2IOB step, I wonder if other (complex) entities are also lost in the way, and whether it is worth checking the export code.

@mromanello What do you think?