
How can I use this library to convert a tag-balanced HTML fragment into a node list idiomatically, reliably and 1:1?

Opened this issue · 0 comments

What is the idiomatic way to use this library to convert a tag-balanced HTML fragment in a string into a node list, in a reliable 1:1 manner that doesn't require checking for multiple corner cases?

$nodeList = what_goes_here("Some text <span>a tag</span> some more text");

// $node list should now contain the exact structure [ TEXT, <span> [ TEXT ] </span>, TEXT ]
// as starkly opposed to [ <p> [ TEXT, <span> [ TEXT ] </span>, TEXT ] </p> ]
// which is what I obtain from ->create("Some text <span>a tag</span> some more text")

EDIT: the issue seems to be that there is no way to specify LIBXML_HTML_NOIMPLIED as a global policy. Even if you set the option after creating the document and before loading contents, various manipulation functions will create other document objects internally for processing, and they won't propagate the LIBXML_HTML_NOIMPLIED option to them; looks like they couldn't even do that at all, because there is no Document::getLibxmlOptions().