Automattic/o2

Tags: text that looks like a hashtag in code blocks and link text should not be added to a post

Closed this issue · 3 comments

Currently o2 is zealous in grabbing all tags in #hashtag format and adding them as post tags in the standard "post" post type taxonomy.

I'm pretty sure — could be wrong — that the design intent was to ignore anything that could be used in a code snippet, link href or link text, or pre-formatted HTML.

Evidence here:

o2/inc/tags.php

Line 277 in 609e3f5

if ( ! empty( $parent->tagName ) && in_array( strtolower( $parent->tagName ), array( 'pre', 'code', 'a', 'script', 'style', 'head' ) ) ) {
— ignoring certain parent HTML tags.

This doesn't seem to be working, though. :/

Steps to repeat

Start a new post, and use post content such as:

This one is in a link: <a href="https://www.google.com/">#tag-google</a>.

This one is in a code block: <code>#tag-code-what</code>

This one is plain text: #tag-plain

Publish the post. Then look at the post tags added in (verify in wp-admin post editor screen).

What I expected

I'd expect only #tag-plain to be added to the post as a WordPress post tag.

What happened instead

All three items that are vaguely hashtag-looking are added as post tags.

Screenshots

screen shot 2016-10-13 at 14 25 09

screen shot 2016-10-13 at 14 25 22

screen shot 2016-10-13 at 14 25 30

screen shot 2016-10-13 at 14 25 36

The same thing happens in comment text.

What does appear to work is if the tag is contained inside the [code] ... [/code] shortcode. Those tags are not added to the post

pento commented

Wow, that's been broken for a long time.

It's caused by the htmlentities() call here, which is obviously incorrect in retrospect. :-)

It was originally added to avoid warnings that DOMDocument would raise on some HTML. I'm inclined to add LIBXML_NOWARNING | LIBXML_NOERROR to the loadHTML() call when WP_DEBUG is disabled. I don't recall the old behaviour causing data loss, just irrelevant messages.