jhuckaby/pixl-xml

parsing node content that contains xml/html-style tags

Closed this issue · 1 comments

Given the following XML:

<xml>
  <node>some data</node>
  <italics>some <i>italic</i> data</italics>
</xml>

Parsing I get:

{
    "node": "some data",
    "italics": {
        "_Data": "some data",
        "i": "italic"
    }
}

I suppose this output makes sense, but is there some possibility to get it as follows with an option?

{
    "node": "some data",
    "italics": {
        "_Data": "some <i>italic</i> data"
    }
}

Of course, I could replace the <i></i> tags in the string before parsing (I don't need them preserved), but it would be neat to have an option to alter that behaviour if possible.

Yeah, I do apologize, but this behavior is as designed. This library really isn't designed to parse HTML or HTML-like complex elements with mixed text and child elements at the same level in the hierarchy. See Issue #17 for a more detailed explanation.

There is really no easy way to fix this due to the way the library was designed, which uses nested recursive regular expressions. The only way to get what you want would be to pre-entitize the <i> and </i> into &lt;i&gt; and &lt;/i&gt; respectively before parsing.

I am sorry for this shortcoming, but the library was only designed to parse simple XML configuration files into a simple hash/array tree.