microformats/microformats2-parsing

vcp: innertext in value-class-pattern needs clarification

kartikprabhu opened this issue · 3 comments

The value-class-pattern for date and time parsing refers to innertext in many places.

Currently it is not clear what innertext exactly means. It would be good to have some more direction similar to textContent specification in, for example, http://microformats.org/wiki/microformats2-parsing#parsing_a_p-_property

If I read mf2py source correctly right now, it just uses textContent? php-mf2 uses textContent with stripped whitespace, so close to what you are proposing. I think this makes sense, replacing all instances of ìnner-text` on http://microformats.org/wiki/value-class-pattern with a definition like in the mf2-parsing-spec:

the textContent of the element after removing all leading/trailing whitespace and nested <script> & <style> elements.

Do we want the p--style modification that also replaces images?

replacing any nested elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative;

Process question: can/should we touch the value-class-pattern page (which also documents mf1), or should this be defined on the mf2-page as a clarification of the link to it?

Related issue: #15

I think value-class-pattern is so little used that this might not be an issue in real world examples anymore; but this is just a guess.

I might come up when testing different parsers against each other.