Define removal of SCRIPT and STYLE elements everywhere textContent is requested.
Zegnat opened this issue · 6 comments
In practice parsers are already doing this everywhere, but that is currently against the specification. I say this is a mistake in the spec and not in parsers.
When the textContent value is used in mf2 we specify the removal of <script> and <style> elements within p-x, u-x, and dt-x parsing. But do not for e-x or implied name parsing.
According to spec:
<div class="x-h">Hello <script>beautiful </script>person</div>Results in an implied name of Hello beautiful person.
Previous issue and resolution: http://microformats.org/wiki/microformats2-parsing-issues#exclude_style_elements_before_parsing
Appears might have just missed some instances in the spec update, but need to double-check and confirm. See this revision.
given what @gRegorLove found I'd say the missing pieces are:
a) specify the same for the value-version of a e-property (which likely was missed since the html was explicitly excluded in the discussion)
b) in the section about implied name properties, make it clear that textContent should be postprocessed the same way as for p- properties.
Proposed updates, which I believe are in line with the resolution:
parsing a p- property
No content change, just splitting out whitespace trimming into a separate bullet point:
Original:
- else return the textContent of the element after:
- dropping any nested <script> & <style> elements;
- replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving any relative URLs, and removing all leading/trailing whitespace.
Updated:
- else return the textContent of the element after:
- dropping any nested <script> & <style> elements;
- replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative
- removing all leading/trailing whitespace.
parsing an e- property
Original:
Updated:
- value: the textContent of the element after:
- dropping any nested <script> & <style> elements;
- replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative
- removing all leading/trailing whitespace.
parsing for implied properties
For implied name:
Original:
- else use the textContent of the .h-x for name
Updated:
- else return the textContent of the .h-x after:
- dropping any nested <script> & <style> elements;
- replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative
- removing all leading/trailing whitespace.
LGTM
Updated in spec with this revision: http://microformats.org/wiki/index.php?title=microformats2-parsing&oldid=66660
This issue can be closed now.
Closing as this has been updated and I’m not sure why it was kept open anyway.