microformats/php-mf2

Fix backcompat hfeed parsing

gRegorLove opened this issue · 1 comments

Using the example in microformats/microformats2-parsing#11 (comment), running just the hfeed element through the parser seems to incorrectly parse the entry-title and entry-content without an intervening hentry:

<div id="page" class="hfeed site wrap">
	<h1 class="entry-title"><span class='p-name'>title</span></h1>
	other content
	<div class="entry-content">
		<div class="e-content">this is a test for indieweb post </div> <span class="syn-text">Also on:</span>
		<!--syndication links -->
	</div>
</div>

currently parses as:

{
    "items": [
        {
            "type": [
                "h-feed"
            ],
            "properties": {
                "name": [
                    "title"
                ],
                "content": [
                    {
                        "html": "this is a test for indieweb post ",
                        "value": "this is a test for indieweb post"
                    }
                ]
            }
        }
    ],
    "rels": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "version": "v0.3.2",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
        ]
    }
}

I would expect:

So properties should be empty in the parsed result.

Looking at this more closely, it's an issue I ran into while improving the backcompat parsing (#111). Ideally, the parser needs to distinguish between 1) mf2 properties that were explicitly authored inside mf1 roots and 2) mf1 properties that have been upgraded to mf2.

Currently php-mf2 doesn't do that. After running the backcompat algorithm and finding no hfeed properties to upgrade, it adds the h-feed class to the hfeed root and continues to parse it as mf2. Thus it parses the p-name and e-content even though it shouldn't. It is not aware of whether those elements were upgraded or authored that way.

I punted on it at the time because it seems complex to solve and at the time I was not aware of examples of it causing issues.