eeditiones/tei-publisher-lib

Resolving hyperlinks in docx footnotes and endnotes

joewiz opened this issue · 0 comments

tei-publisher-lib incorrectly parses hyperlinks contained in footnotes, endnotes, and other document components, yielding the wrong link target.

In OOXML WordprocessingML, hyperlink targets are stored in files in the word/_rels folder. Links in the document body are stored in document.xml.rels, those in footnotes are stored in footnotes.xml.rels, and endnotes in endnotes.xml.rels. The Relationship/@Id attribute values are not unique across these various .rels documents, so there can be ID collisions if the source of the hyperlink (body vs. footnote vs. endnote) is not checked.

The attached revision to tei-publisher-app's test.docx - test.docx - adds a pair of test footnotes and endnotes with links to demonstrate this issue.

To better understand the issue, let's look at one of the existing hyperlinks that work, from the document's body:

You can also download it from the TEI Publisher git repository, ...

In the OOXML, this appears inline, and its ancestor is <w:body>:

<w:hyperlink r:id="rId8" w:history="1">
    <w:r w:rsidR="00E20685">
        <w:rPr>
            <w:rStyle w:val="Hyperlink"/>
            <w:lang w:val="en-US"/>
        </w:rPr>
        <w:t>TEI Publisher git repository</w:t>
    </w:r>
</w:hyperlink>

The <w:hyperlink> element's ID points to a hyperlink stored in _rels/document.xml.rels:

<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" 
    Target="https://github.com/eeditiones/tei-publisher-app/blob/master/data/doc/test.docx" TargetMode="External"/>

In contrast, the hyperlink in footnote 4:

And here we have a footnote with a link to an external URL.

... doesn't appear directly in the body, but rather in footnotes.xml. Here's the sole reference to the footnote in document.xml - no hyperlink:

<w:r w:rsidR="00AB161D">
    <w:rPr>
        <w:rStyle w:val="FootnoteReference"/>
    </w:rPr>
    <w:footnoteReference w:id="6"/>
</w:r>

We have to look to footnotes.xml to find the content of the footnote, and its hyperlink ID:

<w:footnote w:id="6">
    <w:p w14:paraId="5FF36D79" w14:textId="1F50E494" w:rsidR="00AB161D" w:rsidRPr="00AB161D"
        w:rsidRDefault="00AB161D">
        <w:pPr>
            <w:pStyle w:val="FootnoteText"/>
            <w:rPr>
                <w:lang w:val="en-US"/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:rStyle w:val="FootnoteReference"/>
            </w:rPr>
            <w:footnoteRef/>
        </w:r>
        <w:r>
            <w:t xml:space="preserve"> </w:t>
        </w:r>
        <w:r>
            <w:rPr>
                <w:lang w:val="en-US"/>
            </w:rPr>
            <w:t xml:space="preserve">And here we have a footnote with a </w:t>
        </w:r>
        <w:hyperlink r:id="rId1" w:history="1">
            <w:r w:rsidRPr="00AB161D">
                <w:rPr>
                    <w:rStyle w:val="Hyperlink"/>
                    <w:lang w:val="en-US"/>
                </w:rPr>
                <w:t>link</w:t>
            </w:r>
        </w:hyperlink>
        <w:r>
            <w:rPr>
                <w:lang w:val="en-US"/>
            </w:rPr>
            <w:t xml:space="preserve"> to an external URL.</w:t>
        </w:r>
    </w:p>
</w:footnote>

The hyperlink's ancestor is <w:footnote> rather than <w:body>, and its ID is @r:id="rId1". The _rels/document.xml.rels document has a hyperlink by that ID, and this is what tei-publisher-lib mistakenly looks up:

<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
    Target="../customXml/item1.xml"/>

... and renders in the current TEI output:

<ref target="../customXml/item1.xml">
    <hi rend="u">link</hi>
</ref>

Instead, tei-publisher-lib should look up rId1 in _rels/footnote.xml.rels:

<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"
    Target="https://e-editiones.org/" TargetMode="External"/>

If tei-publisher-lib correctly looked up the hyperlink ID in this file, the expected TEI for this footnote's hyperlink would be:

<ref target="https://e-editiones.org/">
    <hi rend="u">link</hi>
</ref>

I believe the fix needs to take place here: