tc39/proposal-iterator-helpers

Web incompatibility discovered in theathletic.com

syg opened this issue Β· 58 comments

syg commented

Chrome discovered a web incompatibility and is unshipping the iterator helpers proposal for now.

cc the other browser folks @dminor @msaboff

The incompat is reproduced below from the issue for ease of reading. Please see https://bugs.chromium.org/p/chromium/issues/detail?id=1474613 for full details.

  1. theathletic.com uses a product called airgap.js. This product has a "tamper proof" mode that tries to freeze built-in collections' iterators and iterator prototypes. It has the following snippet:
            let e = (s = (r = R) == null ? void 0 : r.Iterator) == null ? void 0 : s.prototype;
            e && Dt(e);

where R is the global scope, and Dt is a utility function that ultimately called Object.freeze on the argument. The effect of this snippet is that if there's a global called Iterator, Iterator.prototype becomes frozen.

  1. theathletic.com, like many sites, ships an old version (< 0.13.8) of the regenerator runtime to polyfill generators. The runtime makes its own version of the Generator function, including setting up the prototype chain. Generator.prototype's [[Prototype]] is Iterator.prototype. Prior to regenerator 0.13.8, setting up the generator code looked like the following:
  var Gp = GeneratorFunctionPrototype.prototype =
    Generator.prototype = Object.create(IteratorPrototype);
  GeneratorFunction.prototype = Gp.constructor = GeneratorFunctionPrototype;
  GeneratorFunctionPrototype.constructor = GeneratorFunction;

The assignment Gp.constructor = GeneratorFunctionPrototype triggers what's commonly called the "override mistake" in JS, where a non-writable property that exists on some object's [[Prototype]] prevents a new data property from being created on that object. In this case, because airgap.js froze Iterator.prototype, Iterator.prototype.constructor is non-writable, causing this assignment to throw.

Regenerator runtime fixed this bug upstream 2 years ago in facebook/regenerator#411

ljharb commented

oof, the second one could maybe be addressed by folks updating regenerator-runtime, but the first one seems like we could never ship a global called Iterator.

Is there any insight into why they would have that snippet in the first place?

syg commented

Is there any insight into why they would have that snippet in the first place?

This product is closed source AFAICT. If someone has the time, feel free to reach out to Transcend?

IIUC, the issue is that an old version of regenerator is not compatible with the airgap.js "tamper proof" mode when both are loaded in an environment that includes an Iterator global. Am I alone in not considering this interaction bug to be a significant web compatibility issue?

ljharb commented

Freezing builtins is exceedingly rare, and I'd expect the sites that do to be aggressive in keeping up to date.

@gibson042 Due to its popularity, this website alone qualifies as a significant web compatibility issue.

As @ljharb mentions, I'm confounded by a website using a library that freezes intrinsics, yet does so on top of really outdated polyfills. Is there any chance to reach out to theathletic.com to update their frontend deps?

syg commented

As @ljharb mentions, I'm confounded by a website using a library that freezes intrinsics, yet does so on top of really outdated polyfills. Is there any chance to reach out to theathletics.com to update their frontend deps?

I'm already working internal channels to reach out to them, but non-Google leads would also be appreciated if anyone here has connections.

syg commented

To recap some convo with @bakkot, there are two ways to workaround this issue for theathletic.com:

  1. Reach out to theathletic.com and get them to update its bundled regenerator runtime. Babel chooses a version automatically depending on the downlevel version, so maybe this is not as easy as it sounds? I'm not sure.
  2. Reach out to transcend.io and get them to update their CDN-served airgap.js to workaround the issue by e.g. delete Iterator.prototype.constructor before doing its freezing. This would be transparent to theathletic.com.

I'm trying to do 1 above. Could I get a volunteer for trying 2 as well?

bakkot commented

cc @michaelfarrell76 @bencmbrook @eligrey @anotherminh as some people at Transcend according to Github. See above thread about a web compatibility issue caused by an interaction between airgap-js, regenerator-runtime, and a new JS language feature (Iterator helpers).

Sorry if y'all are the wrong people to ping; please forward this thread to appropriate people.

Thank you for bringing this to our attention! Depending on our analysis of this issue we may change airgap.js functionality to better accommodate the iterator helpers proposal.

I haven't gone over this entirely, but as a quick fix for the specific site in question we can manually disable our tamper resistance feature for The Athletic with their approval. We're reaching out to them now.

For reference, here is a snippet that is representative of the airgap.js tamper resistance code in question that is causing the problematic interaction: https://gist.github.com/eligrey/c5a252f1f48cbdd8bf2b344b4ddaea65

The intention of this code is to assist in preventing malicious code from interfering with our library's network regulation capabilities. Our consent manager library was designed to coexist with and regulate potentially hostile adtech environments. Once the ShadowRealm API is shipping in all major browsers, we intend to explore using that as an alternative technique to achieve tamper resistance.

This snippet was initially written with a version of the iterator helpers proposal in mind. Ignoring the issue caused by the interaction with the old version of regenerator runtime, is this snippet at odds with any common use cases enabled by the iterator helpers proposal? I want to better understand if there are any changes that we should make on our end.

bakkot commented

This snippet was initially written with a version of the iterator helpers proposal in mind. Ignoring the issue caused by the interaction with the old version of regenerator runtime, is this snippet at odds with any common use cases enabled by the iterator helpers proposal?

I don't see any obvious other problems, just the override mistake thing. Of course if it runs before a polyfill of this proposal it will prevent that polyfill from taking effect, but that is a different kind of issue.

(And it's kind of overkill for your purposes; just freezing the Symbol.iterator and iterator's next properties would be sufficient (unless you are worried about someone adding .return or .throw, but in that case you'd also need to freeze Object.prototype to prevent them from being added there). But the overkill shouldn't cause problems except for the override mistake issues.)

Do note that making Iterator.prototype[Symbol.toStringTag] non-writable has exactly the same problem as making Iterator.prototype.constructor non-writable, i.e., it will break old versions of regenerator-runtime. Same probably goes for Iterator.prototype[Symbol.iterator].

@gibson042 Due to its popularity, this website alone qualifies as a significant web compatibility issue.

Seems to me like we just uncovered a bug at theathletic.com, which is rather easy to fix and likely will be in short order.

syg commented

Seems to me like we just uncovered a bug at theathletic.com, which is rather easy to fix and likely will be in short order.

@gibson042 I don't want to underplay the severity here and suggest that somehow the nature of the bug (i.e. "unlikely interaction") determines the severity. The practical impact remains that, if this hits Chrome stable without fixes upstream or unshipping, we break many customers, some of whom will blame Chrome, some of whom will blame NYT, and none of which is good for the web. With Chrome 117 going to stable next Tuesday, 3 business days is pretty short to not treat it seriously.

We lucked out here with Transcend being (1) so responsive and (2) airgap.js being shipped via CDN. The unlikely interaction nature of the bug isn't why we might get away with not unshipping after all, it's those incidental factors.

this is very unlucky.

some OT: We also freeze our intrinsic (including Iterator.prototype) with npmjs.com/ses, This is my first time knowing airgap.js also doing this. ses already grabs Iterator.prototype from the generator syntax, so the old regenerator-runtime is already broken in our environment. We manually patched it to fix this problem.

@eligrey You might consider replacing Iterator.prototype.constructor with a getter/setter before freezing. In SES, we call this an β€œenablement” to compensate for the override mistake. This may allow help airgap maintain its integrity and coexist with regenerator.

However, if integrity is your aim, capturing the intrinsics early and avoiding polymorphic calls may allow Airgap to defend its integrity against subsequent scripts without requiring frozen shared intrinsics. For SES, we use a no-polymorphic-call lint rule, capture intrinsics early, and use uncurried intrinsic methods.

bakkot commented

@kriskowal There's no ergonomic way to capture and use the originals for iterators specifically. Your options are either entirely avoiding iteration of arrays (etc) or freezing the relevant intrinsics.

However, if integrity is your aim, capturing the intrinsics early and avoiding polymorphic calls may allow Airgap to defend its integrity against subsequent scripts without requiring frozen shared intrinsics. For SES, we use a no-polymorphic-call lint rule, capture intrinsics early, and use uncurried intrinsic methods.

@kriskowal We already do all of that except for the lint rule. Thank you for the rule by the way! You won't find any polymorphic calls with unfrozen prototypes past a certain point in airgap.js builds.

Unfortunately, the prototype freezing snippet in question is part of a larger group of interventions used to compensate for something that is slightly less outside of our control: insecure polymorphic calls introduced by our JS bundler's runtime transpilation utilities. We don't currently have the engineering bandwidth to create a secure JS bundler with secure runtime transpilation utilities from scratch or maintain a fork of esbuild (our JS bundler), so we currently use these interventions to improve our security and integrity.

Seems to me like we just uncovered a bug at theathletic.com, which is rather easy to fix and likely will be in short order.

@gibson042 I don't want to underplay the severity here and suggest that somehow the nature of the bug (i.e. "unlikely interaction") determines the severity. The practical impact remains that, if this hits Chrome stable without fixes upstream or unshipping, we break many customers, some of whom will blame Chrome, some of whom will blame NYT, and none of which is good for the web. With Chrome 117 going to stable next Tuesday, 3 business days is pretty short to not treat it seriously.

We lucked out here with Transcend being (1) so responsive and (2) airgap.js being shipped via CDN. The unlikely interaction nature of the bug isn't why we might get away with not unshipping after all, it's those incidental factors.

@syg Sorry, let me elaborate. Unshipping or not is an implementation-specific decision, and I don't think it would be inappropriate here. But the interaction bug itself seems to be fixable by theathletic.com and any other affected siteβ€”which is to say, unshipping can provide a window for those fixes to be deployed, but changes in the proposal are not warranted IMO.

fyi with the long weekend pending, it sounds like the athletic won’t be updated until
tuesday

syg commented

but changes in the proposal are not warranted IMO.

@gibson042 Oh yes, that I agree with. This does not yet point to a cause for any redesign.

syg commented

@michaelfarrell76 @eligrey Is Transcend comfortable sharing how many customers airgap.js has that has tamper-resistance mode turned on? The working assumption is that there's a lot of old regenerator runtime floating out in the wild, but freezing intrinsics is rare. Since there's a mode of airgap.js that does that, knowing how widespread would be helpful.

While I cannot share our customer count or CDN traffic data, I can share that our tamper resistance mode is enabled by default, and customers decide when to upgrade their build & configuration of the airgap.js SDK.

We will send out a heads-up notice to all of our consent manager customers to ensure that regenerator runtime is up-to-date if used, or alternatively to disable our tamper resistance mode.

ya like eli said, it’s on by default and every deploy is controlled by the customer. we can change our default and notify our customers but it will likely take around 2-6 weeks to ensure every customer updates to use that new setting. some of our customers work on a monthly deploy cycle.

i’m not sure i have the exact number or can disclose, there are a few customers that even self host the script, but we’re talking a few thousand unique domains that are likely affected. you can get a sense of some of those customers that you can disclose by looking at the logos on our site: https://transcend.io/

our fortune 500 presence is not reflected in the logos on our site - many of which are big google ads customers. but you’ll see there are some pretty heavily trafficked consumer sites that we serve outside of the fortune 500, like vimeo, gofundme, eventbrite to name a few that are shown on our site with serious consumer traffic

the athletic made the change to update their site - if someone could confirm that it no longer causes the issue, that'd be great!

bakkot commented

I see there's been another incompat in the wild, and Chrome is going to unship.

Looks increasingly likely that the proposal will not be able to ship in its current form unless and until transcend is able to update its library across all its customers, or we manage to get everyone to move off of the very old regenerator runtime.

did this ship? we just got reports from a few customers today that their sites are crashing.

@bakkot we are working to coordinate the rollout for our customers. we would great appreciate rolling this back and giving us a few weeks to get this rolled out.

as i mentioned earlier, many of our customers work on a monthly release cycle

bakkot commented

I'm not on the Chrome team but my understanding was that they'd killed this via Finch on Monday, so if you're still getting reports that's surprising. cc @syg

@bakkot we only got reports starting this morning strangely - @syg do you know whats happening?

syg commented

I've been investigating. I can confirm that the killswitch is rolled out, and 117 branch builds have picked it up. I think we accidentally typoed something. Apologies for the bungle, we'll try to fix this by tomorrow.

@syg thank you so much - we are working urgently with the customers that were affected by this rollout, but we were confused as to how this happened given tickets like this made it sound like the rollout was being reverted on monday

syg commented

Confirming on Chrome's end that we did submit the killswitch with a typo on the feature name, and the fix should be picked up by tomorrow's finch rollout.

thanks you πŸ™

bakkot commented

In case it helps, here's a snippet which will detect old versions of regenerator runtime (older than 0.13.10 - the fix for this went into 0.13.8, so there might be some false positives, but it will at least give a starting point):

function probablyHasOldRegeneratorRuntime() {
  if (typeof regeneratorRuntime === "undefined") return false;
  try {
    regeneratorRuntime.keys("abc")();
    return false;
  } catch (e) {
    return true;
  }
}

@michaelfarrell76 and other Transcend folks, you should hopefully be able to use this to prioritize which customers to reach out to for your update rollout.

For setting expectations, do you think that you'll be able to update your affected customers in the immediate future, or is this going take a few months? If the latter we might add some ugly hacks to the proposal for compat - the proposal is pretty popular and we'd like to ship it if we can.

syg commented

@michaelfarrell76 The new finch config should be rolled out now, and I have confirmed locally that clients that have picked up those new configs will no longer have the iterator helpers proposal enabled.

How you can check:

  1. Navigate to chrome://version/?show-variations-cmd
  2. Search for the string JavaScriptIteratorHelpers/Disabled_Killswitch. If that string is present, then your Chrome has picked up the new finch config.
  3. Confirm that the Iterator global is undefined in the console.

Edit: Restarting the browser should pick up the new finch config.

oh wow @bakkot that was exactly what im looking for! i have a selenium script in place that is running over all of our sites and detecting if regenerator runtime was on it, the only missing piece was detecting the outdated version

thank you @syg for confirming !

@bakkot we've already started the process of notifying our customers, especially the ones with the largest footprint across the web. your snippet will be immensely helpful to streamline the effort, i can check back in early next week to relay the progress. we will do our best to unblock you all as soon as possible.

bakkot commented

@michaelfarrell76 BTW, feel free to email me at [my handle] @ gmail if you want to ask anything in a non-public forum.

@bakkot appreciate it! will do!

syg commented

For local testing on Chrome, you can re-enable the feature with the command line argument --enable-features=kJavaScriptIteratorHelpers.

PR for the accessor-based strategy to resolve this is now open at #287. Please review.

It was suggested by @nicolo-ribaudo in plenary today that we could drop the constructor property (and ship a writable Symbol.toStringTag) as an alternative to #287. This sounds like a reasonable interim solution.

bakkot commented

If airgap is freezing Iterator.prototype, we'd need to omit Symbol.toStringTag entirely as well, since freezing it would make it non-writable.

Yes, that's correct. The absence of Symbol.toStringTag would be more noticeable, but this is still an option worth considering.

ljharb commented

I think omitting both temporarily is a far better approach than shipping funky accessors, and it would allow us to resolve it this week instead of months from now.

An update on the Transcend side - we've migrated around ~85% of our customers off of our tamper resist setting. still pushing on those last few customers to upgrade

@michaelfarrell76 We discussed this at the recent TC39 meeting. The plan is to wait until the next meeting (2023-11-27), at which point we will decide whether to try shipping again or to adopt the backup strategy in #287. Please keep us updated on your customer upgrade roll-out. Do you think you will be able to make this timeline?

@michaelficarra always hard to promise 100% but i think there's a really really good chance!

most folks are migrated, we'll send out comms to the remaining customers ever ~2 weeks now until then!

syg commented

@michaelfarrell76 Thanks for the good news. Is there a possibility that you could provide a small sample of sites that have been updated (that are not theathletic.com, since that was done specially), so that we can spot-check that they are indeed working on Chrome with the feature turned back on? Understandable if you'd like that to be non-public, in which case please email me at syg@google.com.

@syg emailed you!

@michaelfarrell76 Thanks for providing the list of sites. We checked them on Chrome with the feature turned back on and unfortunately found breakage on two of the sites.

@rmahdav can you email me the sites so we can take a look? mike @ transcend.io

We've moved forward with the accessor approach. @michaelfarrell76 please keep us abreast of your update roll-out so that we may consider replacing the accessors with data properties at some point in the future.

ljharb commented

@michaelfarrell76 happy new year! any chance you could provide an update? we'd really like to change back to data properties ASAP, and we have a meeting coming in 3 weeks that I'd like to get consensus on the change at.

syg commented

@michaelfarrell76 happy new year! any chance you could provide an update? we'd really like to change back to data properties ASAP, and we have a meeting coming in 3 weeks that I'd like to get consensus on the change at.

Proposing that in the upcoming meeting in 3 weeks is rushed. The proposal with the accessors hasn't yet hit stable in Chrome. I'd like to reach a steady state for a release or two before more normative changes, in case other compat unknown unknowns come up.

ljharb commented

@syg fair enough; the info from Transcend would still be useful sooner than later.