proycon/foliapy

text offset in morphemes is not checked?

kosloot opened this issue · 2 comments

I added a new variant for Arabic to the examples: arabic.2.2.1.folia.xml with offset information everywhere.

Both folialint and foliavalidator accept this file. NICE!

BUT: when I replace the offset in the first morpheme by some way-off value like 666, it is still accepted bij foliavalidator. This is odd.

folialint states:

> folialint arabic.2.2.1.folia.xml 
arabic.2.2.1.folia.xml failed: Unresolvable text: Text for morpheme(ID=, textclass='current'), has incorrect offset 666
	original msg=Unresolvable text: Reference (ID Xar.p.1.s.1.w.1,class='current') found, but no text match at offset=666 Expected 'اسم' but got ''

There are some limits to text consistency checking currently (not sure if this was one of them). But it should be expanded and fixed then.

Probably related to #15 and #8