faustedition/faust-xml

significant unwanted whitespace

Opened this issue · 6 comments

When I look at these lines

<hi rend="underline">Sph<anchor xml:id="Sphynx"/><mod rend="strikethrough" hand="#g_t">
<mod rend="strikethrough" hand="#g_bl">y</mod>
</mod>nx.</hi>

I see significant unwanted whitespace around <mod rend="strikethrough" hand="#g_bl">y</mod> which fragments Sph y nx:

grafik
(http://dev.faustedition.net/document?sigil=2_H&page=150&view=document)

I see this at several places at the file, but I cannot see any commit that could have created this. @thvitt, can you think of a reason? May this be associated with the upconversion to modern TEI?

no, this is the relevant fragment at the start of git tracking on 2014-10-01:

            <ge:line xml:id="ld" rend="centered">
               <handShift new="#jo_t"/>
               <hi rend="underline">Sph<anchor xml:id="Sphynx"/><f:st hand="#g_t">
                     <f:st hand="#g_bl">i</f:st>
                  </f:st>nx.</hi>
            </ge:line>

Thanks for finding this out, so this is prehistoric whitespace!
It's a pitty that we will never find out whether or not this was introduced together with the famous loss of significant whitespace we suffered early on.

BTW there are rule sets like the ones for XSLT literal result elements according to which this whitespace is irrelevant, so it might well be the result of some auto-formatting etc.

some auto-formatting etc.

In Oxygen when opening or editing a file? Or in serialisation after some prehistoric batch processing, or possibly both?

Could we think of automatically detecting and removing whitespace at the beginning and end of line?

     <line>
        <handShift/>...
     </line>
     <line>
        <hi>...</hi>
     </line>
     <line>
        <abbr>...</abbr>
     </line>

Because we know that this whitespace can't be significant.
It would be especially important to edit stuff like

        a<mod>
           <mod>b</mod>
        </mod>c

Because there the whitespace was added within tokens which is much more ugly than redundant blanks around line content. These latter could also be ignored via the layout algorithm, but IIRC we never discussed this, and it doesn't seem to happen.

removing all whitespace at the beginning and ending of line elements if the respective text node is a child of line shouldn’t be a problem.

For the mod usecase we would need a detailed spec what to remove and when, and what to keep