TEI4HTR/page2tei

Encoding baselines' coordinates with TEI : which attribute ?

HugoSchtr opened this issue · 2 comments

We're currently encoding a baseline from a PAGE XML file as such :

<zone xml:id="eSc_line_26e1eb01"
                  type="mask"
                  points="1256,1599 1260,1548 1340,1548 1384,1566 1447,1548 1523,1548 1545,1566 1571,1548 1615,1548 1633,1566 1758,1555 1923,1566 1952,1552 1974,1566 2447,1563 2458,1603 2439,1625 2388,1629 1787,1629 1773,1614 1641,1629 1626,1614 1443,1614 1428,1629 1340,1614 1260,1629">
               <line type="baseline" points="1260,1603 1981,1599 2029,1607 2460,1605">c/ Edouard Eugène Lebourcq &amp;amp; Eugène Pauline Potel 104 r. St Maur</line>
            </zone>

The <zone> element corresponding to the baseline's mask, and <line> element to the baseline itself, with its text node and its coordinates.

However, with baselines coordinates way more simple, with only two x,y pairs as such :

            <zone xml:id="eSc_line_41fc839d"
                  type="mask"
                  points="293,1511 304,1478 351,1467 377,1478 384,1515 373,1548 296,1559">
               <line type="baseline" points="296,1515 385,1518">203</line>
            </zone>

The encoding become TEI invalid, because the points attribute requires at least 3 x,y pairs. See documentation : https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.coordinated.html

How can we get around this problem ?

After a discussion with @alix-tz, a solution was found to get around this issue:

            <zone xml:id="eSc_line_4218ebcd"
                  type="mask"
                  points="278,981 285,940 311,929 380,948 384,992 359,1028 318,1028 282,1006">
               <path type="baseline" points="278,981 384,992"/>
               <line>199</line>
            </zone>

Where <path> element in the TEI represents a baseline's coordinates. points attribute in this element does not requires three pairs of x,y values at least anymore to be valid.

An additional note: we keep the value "baseline" for the @type attribute in path for the sake of clarity and coherence with the elements available in PAGE, where the line's coordinates (whether baseline or topline) are stored in a "Baseline" element.