Frog creates invalid FoLiA
kosloot opened this issue · 1 comments
consider the following document:
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="strbug" generator="libfolia-v1.14" version="1.5.0">
<metadata type="native">
<annotations>
</annotations>
</metadata>
<text xml:id="strbug.text">
<p xml:id="p.1">
<t>Chipssnijden</t>
<str xml:id="str.1">
<t>Chipssnijden</t>
</str>
</p>
</text>
</FoLiA>
it contains an obsolete Dutch ij
ligature.
Frog will handle this file replacing the ij
by ij
which is wrong.
Replacing should only be done when there is a '--outputclass' specified different from the '--inputclass' (which is "current" here)
In this case 'inputclass' and 'outputclass' are not specified, so both are "current" but that is interpreted wrong apparently.
This yields the erroneous document:
<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="strbug" generator="libfolia-v1.14" version="1.5.0">
<metadata type="native">
<annotations>
<token-annotation annotator="ucto" annotatortype="auto" datetime="2018-10-18T11:11:43" set="tokconfig-nld"/>
<pos-annotation annotator="frog-mbpos-1.0" annotatortype="auto" datetime="2018-10-18T11:11:43" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn"/>
<lemma-annotation annotator="frog-mblem-1.1" annotatortype="auto" datetime="2018-10-18T11:11:43" set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl"/>
</annotations>
</metadata>
<text xml:id="strbug.text">
<p xml:id="p.1">
<t>Chipssnijden</t>
<str xml:id="str.1">
<t>Chipssnijden</t>
</str>
<s xml:id="p.1.s.1">
<w xml:id="p.1.s.1.w.1" class="WORD">
<t>Chipssnijden</t>
<pos class="N(soort,mv,basis)" confidence="0.942748" head="N">
<feat class="soort" subset="ntype"/>
<feat class="mv" subset="getal"/>
<feat class="basis" subset="graad"/>
</pos>
<lemma class="chipssnijden"/>
</w>
</s>
</p>
</text>
</FoLiA>
In this document the 'deeper' text Chipssnijden
from the Word, does not match the Chipssnijden
from the Paragraph, as folialint
points out:
inconsistent text: node p(p.1) has a mismatch for the text in set:current
the element text ='Chipssnijden'
the deeper text ='Chipssnijden'
Ok, the problem was quite obscure:
frog -x filename.xml -X out.xml
DID work correctly
BUT
frog -X out.xml filename.xml
DIDn't
The reason being that, when frog detects an XML file by its extension, it didn't check whether inputclass was the same as outputclass .
When using -x, fo force XML input, this WAS checked.
Fixed now.