Encoding baselines' coordinates with TEI : which attribute ?
HugoSchtr opened this issue · 2 comments
We're currently encoding a baseline from a PAGE XML file as such :
<zone xml:id="eSc_line_26e1eb01"
type="mask"
points="1256,1599 1260,1548 1340,1548 1384,1566 1447,1548 1523,1548 1545,1566 1571,1548 1615,1548 1633,1566 1758,1555 1923,1566 1952,1552 1974,1566 2447,1563 2458,1603 2439,1625 2388,1629 1787,1629 1773,1614 1641,1629 1626,1614 1443,1614 1428,1629 1340,1614 1260,1629">
<line type="baseline" points="1260,1603 1981,1599 2029,1607 2460,1605">c/ Edouard Eugène Lebourcq &amp; Eugène Pauline Potel 104 r. St Maur</line>
</zone>
The <zone>
element corresponding to the baseline's mask, and <line>
element to the baseline itself, with its text node and its coordinates.
However, with baselines coordinates way more simple, with only two x,y pairs as such :
<zone xml:id="eSc_line_41fc839d"
type="mask"
points="293,1511 304,1478 351,1467 377,1478 384,1515 373,1548 296,1559">
<line type="baseline" points="296,1515 385,1518">203</line>
</zone>
The encoding become TEI invalid, because the points
attribute requires at least 3 x,y pairs. See documentation : https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.coordinated.html
How can we get around this problem ?
After a discussion with @alix-tz, a solution was found to get around this issue:
<zone xml:id="eSc_line_4218ebcd"
type="mask"
points="278,981 285,940 311,929 380,948 384,992 359,1028 318,1028 282,1006">
<path type="baseline" points="278,981 384,992"/>
<line>199</line>
</zone>
Where <path>
element in the TEI represents a baseline's coordinates. points
attribute in this element does not requires three pairs of x,y values at least anymore to be valid.