microformats/php-mf2

Do we change the DOMDocument instance that get passed in, and is this an issue?

Zegnat opened this issue · 2 comments

See microformats/mf2py#104. For backwards compatibility parsing, the Python parser changes the DOM on the fly. I believe the PHP parser does a similar thing. It turns out that – in the case of the Python parser – the same DOM object can’t be parsed successfully a second time. The microformats in the base document have been “damaged”.

How can we best test if this is the case with our parser too? Maybe also add a test case where we check that a second parse gives the same result?

Needs investigating. Thanks @kartikprabhu for bringing this up!

(This is basically a todo for myself, therefore also assigning myself.)

Confirmed in php-mf2 if you pass in a DOMDocument, it's modified during parsing:

Input HTML:

<div class="hentry">
    <div class="entry-content">
        <p class="entry-summary">This is a summary</p> 
        <p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
    </div>
</div>
$doc = new DOMDocument();
$doc->loadHTML($html);
echo $doc->saveHTML();
$parse = Mf2\parse($doc);
echo $doc->saveHTML();

Output (trimmed doctype and html, body elements):

<div class="hentry">
    <div class="entry-content">
        <p class="entry-summary">This is a summary</p> 
        <p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
    </div>
</div>

<div class="hentry h-entry">
    <div class="entry-content e-content">
        <p class="entry-summary p-summary">This is a summary</p> 
        <p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
    </div>
<data class="category p-category" value="mytag"></data></div>

Appears to be a simple fix: $doc = clone $input; at this line. Only tested locally with the above HTML.