FoLiA nodes with 'mixed' structure

Question

FoLiA nodes with 'mixed' structure

kosloot opened this issue 6 years ago · 7 comments

Consider this example:

<?xml version='1.0' encoding='utf-8'?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" version="1.5.1" xml:id="page" generator="pynlpl.formats.folia-v1.5.1.88">
  <metadata type="native">
    <annotations>
      <token-annotation annotator="ucto" annotatortype="auto" datetime="2017-10-01T17:33:00" set="tokconfig-nld"/>
    </annotations>
    <meta id="language">nld</meta>
  </metadata>
  <text xml:id="text">
    <s xml:id="s.1"><t>test twee</t></s>
    <p xml:id="p1">
      <w xml:id="w.1">
        <t>test</t>
      </w>
      <w xml:id="w.2">
        <t>aha</t>
      </w>
      <s xml:id="s.2">
        <t>Een brief voor de koning.</t>
      </s>
    </p>
  </text>
</FoLiA>

At the moment Frog will ignore the two words in the paragraph and only handle the sentence within.
This is questionable.
But if we do want to handle those 2 loose words, what is desired then? Should we create a sentence out of them? or leave them separated?
This also involves Ucto, as that is used to create the sentences. (but not for the new Frog implementation we are working on)

Answer 1 · 2023-02-22T14:59:04.000Z

I just tested this, and the "problem" still exists. Frog will ignore the words test and aha.

@proycon can we decide on this?. Or leave it just as an oddity, due to "someone" creating stupid FoLiA?

Answer 2 · 2023-02-22T15:06:34.000Z

Technically ignoring the words is wrong. They are part of the text, just not grouped in a sentence, it may be weird and inconsistent, but it's not invalid FoLiA. It's perfectly okay though if Frog decides not to support this, I'd suggest exiting with an error if it encounters this pattern. (not really a priority though)

Answer 3 · 2023-02-22T15:19:16.000Z

Yes it is valid, though weird FoLiA.
Detecting this and generation an error is probably the best indeed.
Really processing this is really cumbersome, it would imply inserting a new Sentence BEFORE the current Sentence in the paragraph. With id naming problems and such. It MUST be possible, but not worth wile I suppose.

Answer 4 · 2023-02-24T11:49:56.000Z

We had code to ignore this silently. But from now on we will throw an exception.

Answer 5 · 2023-02-24T13:30:34.000Z

Ok, I solved it. But the extra generated Sentence becomes an xml:id which may be surprising.
Must look into that still

Answer 6 · 2023-02-24T16:51:04.000Z

Ok, I solved it. But the extra generated Sentence becomes an xml:id which may be surprising.

That solution was way to naive.
Reverted to the throw it into your face solution

Answer 7 · 2023-05-05T07:25:18.000Z

So we leave it for now.