ampproject/amp-wp

Strip whitespace from data: URLs in stylesheets to prevent AMP validation error

jonhendershot opened this issue · 33 comments

Validator reports CSS syntax error in tag 'style amp-custom' - bad url.

This is still throwing errors on Google Search Console. See ticket #604.

Worth noting that we're on v0.7 of the plugin to fix sanitization issues with the carousel, as per your recommendation in ticket #1076

screen shot 2018-04-20 at 1 40 12 pm

@jonhendershot I can confirm this is an error v0.6. However, it is fixed when I check out the 0.7 branch due to the fix merged in #1004. Note that this was not part of 0.7-beta1. Are you trying with current build of the 0.7 branch (with instructions in #1076 (comment))? Note that we're planning to package a 0.7-rc1 release on Monday.

I can see that it was not fixed in 0.7-beta1, but it is fixed in the current state of the 0.7 branch.

Thanks for such a quick reply @westonruter - I do believe we're packaged with the current build via the instructions given in #1076 . We would've built from it earlier this week.

@jonhendershot please share the URL for the page in question. Also, try running the output through https://validator.ampproject.org/ since it could be that the Google Search Console is reporting based on a cached version.

@westonruter We've got quite a few instances of this error on the site currently. Just checked it again on https://validator.ampproject.org/ and confirmed seeing this issue.

https://www.wellandgood.com/good-looks/millie-bobby-brown-red-carpet-sneaker-look/amp/

@jonhendershot I'm seeing:

<meta name="generator" content="AMP Plugin v0.7-beta1" />

Would you please do another build off of the 0.7 branch to make sure it is on the latest? Note that once you do this you should see the generator as:

<meta name="generator" content="AMP Plugin v0.7-beta1-d65578d-20180420T185722Z" />

@jonhendershot it would be great to get confirmation on this ASAP as we're planning to release v0.7-RC1 on Monday.

@jonhendershot we're going to proceed with the 0.7-RC1 release shortly.

hi @westonruter -- Sorry for the delay. We've updated the plugin and are still seeing the same issues. We've staged the updated plugin in the following environment:
https://amp-slides-well-good.pantheonsite.io/good-looks/millie-bobby-brown-red-carpet-sneaker-look/amp/

Confirmed that we're seeing the following meta tag in our head:
<meta name="generator" content="AMP Plugin v0.7-beta1-d65578d-20180420T213729Z" />

@jonhendershot thanks, I was able to replicate the issue this time. The issue is that there seems to be a space in the original data: URL in your content:

background:url(data:image/png; base64,ivborw0kggoaa...

If I copy your original markup and remove this extra space then the result properly validates. Can you try that?

Either way, this should be fixed, but it's not as bad an issue as #604 was, since it completely corrupted the data: URL.

@westonruter gotcha - thanks! I'm not totally clear, though -- it seems that that string is something generated by the plugin. I can't seem to find it in our codebase anywhere so I'm assuming it's being built somewhere. Should I be digging deeper to find a solve, or is this something on your side?

@jonhendershot The style rule is coming from an inline style. Specifically, it is coming from this block of HTML:

<blockquote class="instagram-media" style="background: #FFF; border: 0; border-radius: 3px; box-shadow: 0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width: 658px; padding: 0; width: calc(100% - 2px);" data-instgrm-permalink="https://www.instagram.com/p/BePtv5Ug43B/" data-instgrm-version="8">
<div style="padding: 8px;">
<div style="background: #F8F8F8; line-height: 0; margin-top: 40px; padding: 62.5% 0; text-align: center; width: 100%;">
<div style="background: url(data:image/png; base64,ivborw0kggoaaaansuheugaaacwaaaascamaaaapwqozaaaabgdbtueaalgpc/xhbqaaaafzukdcak7ohokaaaamuexurczmzpf399fx1+bm5mzy9amaaadisurbvdjlvzxbesmgces5/p8/t9furvcrmu73jwlzosgsiizurcjo/ad+eqjjb4hv8bft+idpqocx1wjosbfhh2xssxeiyn3uli/6mnree07uiwjev8ueowds88ly97kqytlijkktuybbruayvh5wohixmpi5we58ek028czwyuqdlkpg1bkb4nnm+veanfhqn1k4+gpt6ugqcvu2h2ovuif/gwufyy8owepdyzsa3avcqpvovvzzz2vtnn2wu8qzvjddeto90gsy9mvlqtgysy231mxry6i2ggqjrty0l8fxcxfcbbhwrsyyaaaaaelftksuqmcc); display: block; height: 44px; margin: 0 auto -44px; position: relative; top: -22px; width: 44px;"></div>
</div>
<p style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; line-height: 17px; margin-bottom: 0; margin-top: 8px; overflow: hidden; padding: 8px 0 7px; text-align: center; text-overflow: ellipsis; white-space: nowrap;"><a style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" href="https://www.instagram.com/p/BePtv5Ug43B/" target="_blank" rel="noopener">A post shared by Millie Bobby Brown (@milliebobbybrown)</a> on <time style="font-family: Arial,sans-serif; font-size: 14px; line-height: 17px;" datetime="2018-01-22T08:46:26+00:00">Jan 22, 2018 at 12:46am PST</time>
</div>
</blockquote>

Is this coming from an Instagram embed? Please share the underlying post_content for this embed so we can reproduce.

I can see actually that this is the Instagram embed code they supply. Here's another embed code snippet just copied from a post:

<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-permalink="https://www.instagram.com/p/Bh7jRUkBIzB/" data-instgrm-version="8" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:8px;"> <div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;"> <div style=" background:url(); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div></div> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/Bh7jRUkBIzB/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">Introducing the new “Psycho Kaler”! Kale, Calabrian Chilies, Cream, Mozzarella, Garlic, and Ricotta. #pizzaweekiseveryweek</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by <a href="https://www.instagram.com/fillmorepdx/" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px;" target="_blank"> Info@filmorepdx.com</a> (@fillmorepdx) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2018-04-23T22:55:39+00:00">Apr 23, 2018 at 3:55pm PDT</time></p></div></blockquote> <script async defer src="//www.instagram.com/embed.js"></script>

Notice here that there is no space in the background image data: URL. It appears that your code was modified to add the space.

@westonruter yah good call -- thanks for digging through that for me!! Looks like that code is being added via the TinyMCE content editor, so it'll be on content management and/or a content filter to protect against it. Thank you again for such quick replies this week!

I'll re-open this because it's something we can fix in CSS sanitizer to prevent extra spaces from being added in the first place.

To fix the issue, this is where I would work:

https://github.com/Automattic/amp-wp/blob/1e2cd22f1bfe8ab9d833c901e02111120f52fe72/includes/sanitizers/class-amp-style-sanitizer.php#L749-L758

The real_path_urls method should be renamed to normalize_urls, and then instead of just skipping data: URLs, it should remove all whitespace before continuing.

ah, wonderful! Thanks @westonruter -- will keep an eye on this

@westonruter just wanted to follow up on this and see if it has been addressed yet. Thanks!

@jonhendershot no it has not.

@westonruter ok cool - any estimate on when we can expect to see this patched?

It will be part of the 1.0 release which would be coming this summer. I suggest implementing an alternative with a the_content filter in the mean time.

Mte90 commented

I fixed with this code:

add_filter( 'the_content', 'fix_amp_space' );

function fix_amp_space( $content ) {
    if ( is_amp_endpoint() ) {
        return str_replace('data:image/png; base64,', 'data:image/png;base64,', $content);
    }
    return $content;
}

But I don't understand why in 0.8 this fix cannot be implemented because there are a lot of people reporting this problem.

The fix here using the_content is more of a workaround than a proper fix (as the fix here won't apply to CSS in external stylesheets). Only with 1.0 do we have an actual CSS Parser which can be used to extract the data: URLs from stylesheets. In 0.7 the only option is to do some string replacement like this. But it's a pretty brute-force fix.

@amedina Instead of removing the spaces from the data: URL it may be better to wrap the content in quotes. This would also fix the issue of using SVG in data: URLs per: https://wordpress.org/support/topic/background-gradients-invalid/

Wrapping the dataUri with quotes is indeed the better approach here.

@mehigh why? URLs don't allow whitespace so just removing whitespace seems to have the same effect, and it also has the added benefit of saving at least 3 bytes.

I googled. What I knew was not up to date.. this was actually something I picked up from the CSS validator years back, but it seems this was reported as a bug which got fixed .. 3 years ago, since quotes are optional for data uris as well. I've always used quotes myself though.
w3c/css-validator#42 (comment)

TL;DR: dropping the quotes for data embeds is OK.

Chiming in after getting AMP regressions.

Since removing the whitespace works but is a little brute force, could we wrap in the single quote to support legacy embeds (like Instagram)? I wouldn't call it better but at least it would fix the errors until 1.0 version is ready.

@davisshaver Regressions as in this wasn't a problem in 0.6?

If the issue is specifically Instagram embeds, then the proper solution for that would be in #1103. It seems WordPress has a tendency to add spaces inside the data: URL and so this might be actually an issue with TinyMCE. In the general case, the fix is to target the data: URLs as in #1164 to remove the spaces. This targeting of URLs is most reliable when we have an actual CSS parser, which we do as of 1.0-alpha. In the mean time, I suggest in the mean time using a the_content filter on your own site to strip out the invalid spaces as a temporary workaround, like in #1089 (comment)

Nice! Thanks @westonruter. By regression I just meant that formerly-validating AMP pages were no longer validating. It looks like @amedina's patch will fix this though. 🚀

formerly-validating AMP pages were no longer validating

@davisshaver right, but are they no longer validating because of a change in the plugin or a change on your site?

@westonruter Sorry I really don't know... I think the change in plugin caused it, as I don't believe anything related to inline images changed on the site; the content affected was all archival (Feb 2018 or earlier); and I don't recall seeing this error at any point previously. BUT I am unable to verify that the pages were at any time indexed in AMP, as analytics data shows no hits to the pages, and I didn't keep clean enough records of plugin versions to put this into context of v0.6/v0.7. Sorry I can't be more help.

Steps To Test

Hi @csossi,
Could you please test this?

  1. Create a new post on the Native AMP staging site
  2. Copy this Instagram URL into the post editor: https://www.instagram.com/p/BjnK5eGlil4
  3. This should render as an Instagram post
  4. Visit the front-end of the post
  5. Expected: There are no AMP validation errors, when using the Chrome AMP extension, or the AMP Validator