Support for comments

Question

Support for comments

timnieder opened this issue 6 months ago · 0 comments

We encountered a issue where the library fails to read the file if it finds a comment:

TypeError: Cannot read properties of undefined (reading 'filter')
    at easy-template-x.js:1907:47
    at Generator.next (<anonymous>)
    at asyncGeneratorStep (asyncToGenerator.js:3:1)
    at _next (asyncToGenerator.js:22:1)
    at _ZoneDelegate.invoke (zone.js:368:26)
    at Object.onInvoke (core.mjs:11083:33)
    at _ZoneDelegate.invoke (zone.js:367:52)
    at Zone.run (zone.js:129:43)
    at zone.js:1257:36
    at _ZoneDelegate.invokeTask (zone.js:402:31)

Here it fails to find the body node (or the children of the body node):

async getHeaderOrFooter(type) {
    var _sectionProps$childNo, _attributes;
    const nodeName = this.headerFooterNodeName(type);
    const nodeTypeAttribute = this.headerFooterType(type);

    // find the last section properties
    // see: http://officeopenxml.com/WPsection.php
    const docRoot = await this.mainDocument.xmlRoot();
    const body = docRoot.childNodes[0];
    const sectionProps = last(body.childNodes.filter(node => node.nodeType === XmlNodeType.General));
    if (sectionProps.nodeName != 'w:sectPr') return null;

A look into the document.xml shows the issue:

<w:document ...
    mc:Ignorable="w14 w15 wp14"><!-- Generated by Aspose.Words for .NET 23.5.0 -->
    <w:body>
        <w:p w:rsidR="001F47F4" w:rsidP="001F47F4" w14:paraId="425762CC" w14:textId="77777777">
....

The library tries to get the first child node as the body, but this is not always the case (e.g. there could be a comment beforehand or like in #103 a w:background tag).

A file generated with the aspose.words online editor has the same problem:
test.docx

I've tried to fix it by changing the code:

private async getHeaderOrFooter(type: ContentPartType): Promise<XmlPart> {

    const nodeName = this.headerFooterNodeName(type);
    const nodeTypeAttribute = this.headerFooterType(type);

    // find the last section properties
    // see: http://officeopenxml.com/WPsection.php
    const docRoot = await this.mainDocument.xmlRoot();
    const body = docRoot.childNodes.find(node => node.nodeName == 'w:body');
    if (body == null)
        return null;

    const sectionProps = last(body.childNodes.filter(node => node.nodeType === XmlNodeType.General));
    if (sectionProps.nodeName != 'w:sectPr')
        return null;

This allows the library to parse the comments without a problem, but not they are detected as text nodes (I believe) and written into the final document like this:

<w:document ...
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    mc:Ignorable="w14 w15 wp14">
    <#comment/>
    <w:body>
        <w:p w:rsidR="001F47F4" w:rsidP="001F47F4" w14:paraId="425762CC" w14:textId="77777777">
...

Which fails to open using word as it's an invalid document.

To make a proper fix, one would probably have to add full support for comments.