HTML integration compatibility with assertion invariant
littledan opened this issue · 28 comments
In whatwg/html#7233, browsers are considering a significant change to how JSON and CSS modules work: The "destination" of the fetch would be based on the type. If I understand the specifications correctly, this means that CSS modules would have an Accept header which is different from scripts (in fetch step 13.2). This is a clear violation of the nature of "assertions", which is specified as,
moduleRequest.[[Assertions]] must not influence the interpretation of the module or the module specifier; instead, it may be used to determine whether the algorithm completes normally or with an abrupt completion.
Let's discuss what to do here. We should be careful about any relaxation, taking into account the feedback @devsnek and @ljharb have given, which motivated that restriction in the first place.
If we do make changes in the requirement, the keyword assert
in the syntax doesn't make as much sense. This keyword is designed to remind developers that the assertion does not affect the interpretation (instead, the MIME type of the response does, in the case of module types on the Web). Honestly, I don't know if this meaning resonated with anyone at all, so this may be a chance to reconsider the mental model. On the other hand, we may want to maintain the existing syntax, just given that this is already at Stage 3 and it's shipping in a browser and many tools already.
I would find it extremely distasteful for the linked html change to be implemented, but I also already find the entire assert-type-json dance they forced to be extremely distasteful so 🤷♂️
On the technical side what immediately jumps out to me is what happens when you have multiple imports for the same resource with different assertions. This isn't something we can resolve in our spec since we don't have a concept of a global cache, and I'm pretty sure it's just a race condition on their end, so you might have an unstable Accept
header depending on how the graph fills in :/
It's not a race condition in HTML as it is part of the cache key (maybe due to anticipating this particular way of forming fetches...)
Right. It also shows through the Sec-Fetch-Dest
request header. The problem it solves is that currently it doesn't enforce the correct CSP policy. This would fix that. And fetching it all as "script" also wouldn't easily allow for reusing existing optimizations in type-specific fetch-pipelines.
If you have an html file like
<script type=module src=a.mjs />
<script type=module src=b.mjs />
and the two files contain something like
import "c"
and
import "c" assert {type:'css'}
what happens if a.mjs and b.mjs take different amounts of time across the network?
The point of the assertion constraint is that you can only ever get one Module Record from a specifier. The presence or absence of an assertion, or its value, can cause the module to fail to load, but may not otherwise influence the result.
If CSS or JSON modules on the web only ever make a single request for a given specifier, then it would be fine.
Do i understand properly that if i try twice, once as css and once as json, this means the browser will send two requests for the same specifier with different headers, giving the server the chance to return different results for both?
If so, this would be fine as long as the browser does not allow both requests to result in a Module Record - iow, whichever was the first (to be initiated, or to be received, whatever HTML chooses - I’d assume to be initiated) would “win” and the other would act as if no module was available.
(I would also assume that an assertionless import succeeding would prevent it from succeeding with any assertions in the future, as well)
I think we already violate that: https://html.spec.whatwg.org/#integration-with-the-javascript-module-system (see second example). I wasn't super closely involved in that though. @domenic might know more.
@annevk i'm aware that there can be race conditions where it's violated, but my understanding is that altho the guarantee can't be enforced in browsers, that the common case will be that it's met. Hopefully confirmation is forthcoming.
I don't think there's a race condition here. I had a lot of trouble understanding what @domenic's motivation was when he was pushing that the module map be keyed by both the URL and the set of assertions, but now that we are seeing that the destination could be based on the type, this design seems perfect for HTML. It just doesn't follow the requirements of "import assertions".
I find the argument of, reusing not just the CSP checking but in general the load pipeline, to be very strong. Actually, it's surprising that we didn't think this through earlier in the proposal process; good catch @annevk .
I did want to ask: Will it be OK to use the "script" destination for both JavaScript and WebAssembly? We have a (mid-term, not near-term) goal that it should be possible to replace a JS module with a Wasm module without requiring changes to importers, so it would be nice if, in that particular case, the interpretation actually is driven by the response mimetype.
@littledan i'm still struggling to understand exactly how this violates the requirements of import assertions. Is it possible - without carefully constructing client and server code to produce it - to import the same specifier with different assertions (or none) and get different conceptual modules, and how exactly?
@ljharb My mental model of the invariant was always, that assertions are not conceptually part of the module map key. My wording in the spec was a little sloppy here, as it talks about interpretation rather than fetching, but I was thinking about fetching being conceptually part of interpretation.
As you and @devsnek pointed out, without them being part of the module map key, there is a potential for a race, since the server might return different results depending on the Accept
or Sec-Fetch-Dest
headers. This is precisely the issue.
I don't really understand what you mean by "without carefully constructing client and server code to produce it"--we're defining the entirety of the semantics, not just when everyone the server is doing what it should do. I'm not sure how we'd the assertion invariant to say it doesn't apply for poorly-behaved servers.
Maybe we could use "should" instead of "must" to do this weakening? IETF, WHATWG and W3C specs often apply "should" to server and client code, not just the underlying technologies; we don't do this in the JS spec yet, but we could.
I totally understand the difficulty - as worded in the spec for import assertions, there's two guarantees - the required one, which is milder - and the stronger one, which it's hoped will be met at least by any implementation that doesn't fetch modules over a network:
It is recommended but not required that implementations additionally conform to the following stronger constraint: each time this operation is called with a specific referencingScriptOrModule, moduleRequest.[[Specifier]] pair as arguments it must return the same Module Record instance if it completes normally.
The intention, and hope, was that all engines, including browsers, would maximally attempt to ensure this - meaning (and I'm speaking about this without any mental model of the web's "module map" or cache in my head) that any subsequent attempt to import a module X after the first attempt had completed would be "locked" to ensure that it returned the first attempt's Module Record. Any attempts to import X during that window (between first attempt and that attempt's fulfillment) would be nondeterministic in this regard.
When talking about race conditions above, I wasn't really getting at simultaneous imports, but rather the fact that what is imported depends on who requests first.
Anyway, the current HTML integration wouldn't really meet this property you describe either, as the module map entries for different assertions are completely independent--a server could return different mime type responses based on the destination in the header.
About module map intuition: I think all hosts will have something like a module map, given that the JS spec requires idempotency; I am not sure how to reason about all of this without thinking about this cache.
This is a value judgement of mine, but: given that this feature was designed partly to meet web needs, and given that the web is an important host environment for JS, we should work towards having the specifications on the JS and HTML sides agree with each other, down to the recommendations/should clauses.
I agree; but the feature existing in the language was predicated on being able to attempt meeting these goals - so part of the work is reevaluating whether this is the best way to solve the original problem, knowing the implementer feedback we know now.
I think the main problem here is the naming. Assertions shouldn't have side effects.
@annevk while i agree with that, the only thing that has consensus to be in the language atm is assertions.
A bit redundant to say we don't have consensus on a change here--we're just beginning to discuss this topic. Let's focus on technical matters in this thread and have any process discussion elsewhere.
It is recommended but not required that implementations additionally conform to the following stronger constraint: each time this operation is called with a specific referencingScriptOrModule, moduleRequest.[[Specifier]] pair as arguments it must return the same Module Record instance if it completes normally.
I think I've actually come up with a normal case that this would block:
Imagine we have the ability to import text as a regular import syntax (I think this is normal, unambiguous, and should probably live in the language instead of as an unofficial extension). For this purpose, imagine this is specified as import txt from "specifier" as "text"
. Here txt
would just be a string of whatever text was in the specifier file. Imagine also that there's a JSON import syntax, like … as "json"
. With this assumption, we can build the case.
I'm in a TS repo, and as part of my normal startup, I need to import the tsconfig.json
file to read some key (this is contrived, but we used to read the config to build the import mapping out of the paths field). I control the file, and I know for certain that the config is valid JSON syntax. So a natural way for me to import the config is via import config from 'path/to/tsconfig.json' as "json"
. Given the above restriction, that means that the config specifier can only be imported as json.
However, I'm now adding a node_module to my project. That node module itself imports the tsconfig.json
(maybe it's TS itself!). As you may know, tsconfig.json
isn't actually specified as JSON, it's JSON-C. So the only generic way for TS to be able to import the config is as text, it cannot use as "json"
syntax. It'd be natural to write import txt from "path/to/tsconfig.json" as "text"
and feed that into their JSON-C parser, but that'll conflict with the code the developer would write. (This could also be dynamic imports, doesn't need to be static)
By importing the config as JSON (again, I own the file and can assert it's real JSON syntax) the natural way, I prevent anything else in my program from loading the file the generic way for JSON-C parsing. Or by using the node_module written in its natural importing syntax, I'm prevented from importing it my way in my code. There's a composability hazard.
Given this example, I think the assertion keys is part of the caching key for a module. It may be a bug to have multiple evaluations of a particular specifier, but it's not definitively a bug.
@jridgewell that use case is very much not something that an assertion would be appropriate for - that’s some kind of “evaluators” proposal. It is definitely not intended for you to be able to pull in a specifier as both, say, text and json. I agree that importing JSONC makes that tricky, but if it’s not json it probably shouldn’t have the json extension either.
that use case is very much not something that an assertion would be appropriate for - that’s some kind of “evaluators” proposal
Isn't importing JSON the defining use case for import assertions? I'm suggesting that JSON is not the only non-JS file worth importing as part of the graph, and text is a natural extension, giving us a way to include contents in a platform-generic way. Vercel is apparently getting constant support requests about how to include text content in an app that'll be passed to some developer defined code.
It is definitely not intended for you to be able to pull in a specifier as both, say, text and json.
I think this is a semantic difference? It must be able to support either-or, but if you compose an app from distinct sources, we might hit cases that can't be composed together. Any kind of well-known file (like configuration files) might hit this.
I think it makes perfect sense to be able to import text, to be clear - in the language itself. I just don't see a use case for importing something as both text and another format. Any file format that isn't supported in the language could totally be importable as text - or as binary, perhaps - and then parsed/transformed at runtime into JS values.
I don't, however, think it makes sense for the same specifier to represent more than a single conceptual format, which means a single kind of module.
I don't, however, think it makes sense for the same specifier to represent more than a single conceptual format, which means a single kind of module.
The point of my example was that two separate parts of the program, which don't directly interact and independently work fine, cannot be composed and run as a single execution. I think that's a flaw in our design, and points to that a single specifier really can be two kinds of assets. With the feedback from browsers that the asset request pipeline could lead two separate modules for the same URL, I think the case is made stronger. The module cache key really is a tuple of specifier and assertions (or evaluators, or attributes, or whatever we call it).
Let's focus here on the web use case (which motivated this proposal) and how to handle sending the right "destination" to fetch, e.g., in cases like CSS modules, which were a motivating case for this proposal. Although I'm personally interested in use cases like the ones @jridgewell raises, we agreed as part of Stage 3 to leave them aside for now and pursue "evaluator" attributes separately (because we thought they weren't needed for this motivating case). However, it was always part of the aim of this proposal to handle CSS modules--we discussed this repeatedly in committee.
To me, it looks like we're learning assertions can no longer be the basis for different module types on the web. If anyone has ideas for how to recover/apply the assertion concept for this use case, I'd be interested in hearing it. I'd also be interested in hearing ideas for how this proposal could be modified to meet this core module-types-on-the-web use case. Those modified proposal versions might be useful for other use cases, and at some point it's useful to start thinking about those, but let's not get ahead of ourselves.
If assertions can't handle different module types on the web, then it seems like there's no point in having them, and we should re-evaluate how to best solve that approach in the language. That would be very important implementer feedback to provide for a stage 3 proposal.
Would it possible for the champions to bring the issues raised by the WhatWG proposal to the next plenary? Seem like there is some urgency here.
@msaboff Yes, we share this urgency, and it's on the agenda for the January plenary. We expect to have more thoughts posted in the next few days. Apologies for our delay.
(In the time that this issue has gone a bit dark, we've been discussing the topic in the fortnightly TC39 module calls. If you want to attend these calls, please reach out on Matrix.)
The invariant has been relaxed to permit what HTML needs. Thanks everyone for taking part in the discussion!