Annotation Issue (coarse.meto-fine.comp)
EmanuelaBoros opened this issue · 1 comments
Hello!
I've noticed a possible missing entity type in COARSE-METO in HIPE-2022-v2.1-hipe2020-train-fr.tsv
, where M. Théodore Reinach
should (possibly) be a pers.ind (line 2,141-2,150):
M O O O O B-comp.title O _ _ NoSpaceAfter
. O O O O I-comp.title O _ _ _
Théodore O O O O B-comp.name O _ _ _
Reinach O O O O I-comp.name O _ _ NoSpaceAfter
, O O O O O O _ _ _
député O O O O B-comp.function O _ _ _
radical O O O O I-comp.function O _ _ _
de O O O O I-comp.function O _ _ _
la O O O O I-comp.function O _ _ EndOfLine
Savoie B-loc O B-loc.adm.reg O I-comp.function O Q12745 _ NoSpaceAfter
Due to several evaluation processes on my side, I'll be checking more in depth other annotated files also, and open an issue for each (if any).
Many thanks for spotting.
After some investigations, here are some (strange) elements, for information and memo.
-
The mention is in document
EXP-1908-01-21-a-i0053
line 2130 (in HIPE-2022-data file and in CLEF-HIPE-2020-internal file). -
In INCEpTION, the mention appears correctly annotated:
- In the exported annotations in EXP-1908-01-21-a-i0053.xmi (CLEF-HIPE-2020-internal), the annotation is not there. See lines 1075 and after (permalink), where only
comp.title|comp.name|comp.function
are exported:
<custom:ImpressoNamedEntity xmi:id="12992" sofa="1" begin="2104" end="2106" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.title"/>
<custom:ImpressoNamedEntity xmi:id="13005" sofa="1" begin="2107" end="2123" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.name"/>
<custom:ImpressoNamedEntity xmi:id="13018" sofa="1" begin="2125" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.function"/>
<custom:ImpressoNamedEntity xmi:id="13031" sofa="1" begin="2146" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="loc.adm.reg" wikidata_id="http://www.wikidata.org/entity/Q12745"/>
We will correct this in the current data and log the change, but since it seems to come from the xmi2IOB
step, I wonder if other (complex) entities are also lost in the way, and whether it is worth checking the export code.
@mromanello What do you think?