microformats/php-mf2

elements with invalid class names beginning with "h-" are parsed as mf with empty type

sknebel opened this issue · 1 comments

http://microformats.org/wiki/microformats-2-parsing says:

The "*" for root (and property) class names consists only of lowercase a-z and '-' characters.

when php-mf2 discovers a class beginning with h- that is not a valid name according to this specification, it still parses the element as if it were a microformat root, but does set "type": [], which I assume will crash quite a few consuming applications that expect type[0] to always exist.

I initially discovered it with this typo:

<div class="h-entry>">
<a href="https://example.com" class="u-url">content</a></div>

which parses as

{
    "items": [
        {
            "type": [],
            "properties": {
                "name": ["content" ],
                "url": [ "https://example.com"]
            }
        }
    ],

Other examples include class="h-👍", class="h-hentry_" or class="h-"

Good catch. Instead of filling in the type in these examples, I think it shouldn't parse anything, since they're not valid root class names. That character restriction was added to avoid parsing from mixed-case helper classes like h-SomeFormatting I was pretty sure that was working, but looks like I've introduced a regression somewhere. Edit: Looks like I was only testing it on property elements, not root elements.