Do we change the DOMDocument instance that get passed in, and is this an issue?
Zegnat opened this issue · 2 comments
See microformats/mf2py#104. For backwards compatibility parsing, the Python parser changes the DOM on the fly. I believe the PHP parser does a similar thing. It turns out that – in the case of the Python parser – the same DOM object can’t be parsed successfully a second time. The microformats in the base document have been “damaged”.
How can we best test if this is the case with our parser too? Maybe also add a test case where we check that a second parse gives the same result?
Needs investigating. Thanks @kartikprabhu for bringing this up!
(This is basically a todo for myself, therefore also assigning myself.)
Confirmed in php-mf2 if you pass in a DOMDocument, it's modified during parsing:
Input HTML:
<div class="hentry">
<div class="entry-content">
<p class="entry-summary">This is a summary</p>
<p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
</div>
</div>$doc = new DOMDocument();
$doc->loadHTML($html);
echo $doc->saveHTML();
$parse = Mf2\parse($doc);
echo $doc->saveHTML();Output (trimmed doctype and html, body elements):
<div class="hentry">
<div class="entry-content">
<p class="entry-summary">This is a summary</p>
<p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
</div>
</div>
<div class="hentry h-entry">
<div class="entry-content e-content">
<p class="entry-summary p-summary">This is a summary</p>
<p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
</div>
<data class="category p-category" value="mytag"></data></div>Appears to be a simple fix: $doc = clone $input; at this line. Only tested locally with the above HTML.