ampproject/amp-wp

Unrecognized elements are silently removed without validation error reporting

westonruter opened this issue · 4 comments

When coming across an element that is not recognized at all, the following code in process_node will be invoked:

https://github.com/Automattic/amp-wp/blob/1e2cd22f1bfe8ab9d833c901e02111120f52fe72/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L284-L292

In the call to replace_node_with_children it will call remove_node instead of remove_invalid_child, and only the latter will report the removal. This needs to be improved to report invalidity when it is being removed for that reason.

Question About remove_invalid_child()

Hi @westonruter,
Thanks for your detailed description.

I'm probably not understanding this. But it looks like when replace_node_with_children() calls remove_node(), remove_node() calls remove_invalid_child(). Is that enough to report the removed node?

https://github.com/Automattic/amp-wp/blob/024e450798c35a89c3390c410ab42522a924f712/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1700-L1709

I think I understand this now 😄

Recreating the Issue

Using this HTML in a new post:

<h2>Finds and reports these invalid elements</h2>

<imgs src="/none.jpg" width="100" height="100" alt="None"></imgs>

<baz class="baz"><p>Invalid baz parent element.</p></baz>

<h2>Silently removes these elements</h2>

<foo class="foo">Invalid baz tag.</foo>

<foobaz class="foobaz"><zab>Invalid <span>nested elements</span></zab></foobaz>

<bazbar invalid="bazbar"><span>Is an invalid "bar" tag.</span></bazbar>

<div class="parent">
	<p>Nesting valid and invalid elements.</p>
	<invalid class="invalid">Is an invalid "invalid" tag</invalid>
	<bazfoo class="bazfoo">Is an invalid "foo" tag <p id="testing-id" style="width: 100px">This should pass.</p></bazfoo>
</div>

<ul>
	<li>hello</li>
	<lili>world</lili>
</ul>

Then publish the post.

Results

  1. It catches <imgs> and <baz> and reports these 2 elements as invalid.
  2. It does not catch the other invalid elements and silently removes them: <foo>, <foobaz>, <zab>, <bazbar>, <invalid>, <bazfoo>, and <lili>.

screen shot 2018-07-25 at 2 54 15 pm

Tracing Issue via Xdebug

The 2 invalid elements that were caught flowed into replace_node_with_children() and into the if code block to run $this->remove_node( $node );:

https://github.com/Automattic/amp-wp/blob/024e450798c35a89c3390c410ab42522a924f712/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1662-L1668

The other invalid elements that were not caught flowed into the else code block:

https://github.com/Automattic/amp-wp/blob/024e450798c35a89c3390c410ab42522a924f712/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1668-L1687

Possible Problem

There is no validation error being generated in the else for the node that's being replaced by the fragment after pushing child nodes back onto the stack.

Moving To "Ready For Merging"

If it's alright, I'm moving this to "Ready For Merging." If you think this could use functional testing, feel free to move it back.