textclass properties on entities not honoured when interpreting wref/@t
Opened this issue · 4 comments
folialint
breaks on the following document with error (foliavalidator
does not complain):
XML error: WordRefence id=TEI.1.text.1.body.1.div1.1.head.1.s.1.w.3 has another value for the t attribute them it's reference. (Zuidhollanschen versus Zuydthollanschen)
It should look in the right textclass
, which is explicitly specified at the entity level.
'Minimal' FoLiA example (http://lst.science.ru.nl/~proycon/issue52.folia.xml):
<s xml:id="TEI.1.par">
<w xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.3" class="WORD" set="tokconfig-nld">
<t>Zuydthollanschen</t>
<t class="contemporary">Zuidhollanschen</t>
<pos class="SPEC(deeleigen)" confidence="1" head="SPEC" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" textclass="contemporary">
<feat class="deeleigen" subset="spectype"/>
</pos>
<lemma class="Zuidhollanschen" set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl" textclass="contemporary"/>
</w>
<w xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.4" class="WORD" set="tokconfig-nld" space="no">
<t>Synodi</t>
<t class="contemporary">Sijnodi</t>
<pos class="SPEC(deeleigen)" confidence="1" head="SPEC" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" textclass="contemporary">
<feat class="deeleigen" subset="spectype"/>
</pos>
<lemma class="Sijnodi" set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl" textclass="contemporary"/>
</w>
<entities xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.entities.1">
<entity xml:id="TEI.1.text.1.body.1.div1.1.head.1.s.1.entities.1.entity.1" class="pro" confidence="0.68202" set="http://ilk.uvt.nl/folia/sets/frog-ner-nl" textclass="contemporary">
<wref id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.3" t="Zuidhollanschen"/>
<wref id="TEI.1.text.1.body.1.div1.1.head.1.s.1.w.4" t="Sijnodi"/>
</entity>
</entities>
</s>
(Resolution needed for completion of INL/nederlab-linguistic-enrichment#12)
Ok,
the error is detected when parsing the wref node, and before appending it to the layer.
So the textclass of the layer is yet unknown. (
It uses the textclass of the referenced Word, which is wrong indeed)
Probably the check has to be postponed to the post_append() method?
A good solution is not easy. For the moment, this check is disabled.
The check is disabled. But should ideally be performed at some stage.
So it keep the issue as an enhancement.