text offset in morphemes is not checked?
kosloot opened this issue · 2 comments
kosloot commented
I added a new variant for Arabic to the examples: arabic.2.2.1.folia.xml
with offset information everywhere.
Both folialint and foliavalidator accept this file. NICE!
BUT: when I replace the offset in the first morpheme by some way-off value like 666, it is still accepted bij foliavalidator. This is odd.
folialint states:
> folialint arabic.2.2.1.folia.xml
arabic.2.2.1.folia.xml failed: Unresolvable text: Text for morpheme(ID=, textclass='current'), has incorrect offset 666
original msg=Unresolvable text: Reference (ID Xar.p.1.s.1.w.1,class='current') found, but no text match at offset=666 Expected 'اسم' but got ''
proycon commented
There are some limits to text consistency checking currently (not sure if this was one of them). But it should be expanded and fixed then.