tc39/proposal-import-attributes

Inline vs out of line module attributes

littledan opened this issue Β· 62 comments

This proposal is all about putting module attributes inline in the JavaScript source text. I believe this is better because it would be annoying to have to switch to a separate resource file to write the module type, and keep these in correspondence. Probably it would only be practical to generate from tools, not write by hand. I'd prefer if we can reduce the number of administrative steps we need tools for.

What do you think? Does anyone think that we put the attributes out of line, in some sort of resource file? If so, where should we put them? (unclear whether import maps would be suitable)

I prefer out of band, since that would make it really clear when you're using a system that doesn't care about the attributes (like type:json on node.js). I do realize the pain of tooling though.

I vastly prefer in band due to overall DX. I think host environments can offer guidance on what they do with these attributes, and I'm not convinced that non-web environments would have no use for them.

bmeck commented

Is there a problem having both? I know we have an out of band solution for some stuff already in Node with policies. It avoids some pretty gross updating that in-band starts to show once you add things like SRI.

I can definitely see the argument for both/either when it comes to SRI. Is there any reason why we'd want both for type, rather than mandate its inclusion inline?

(Edit: and, I would be OK with mandating SRI be out of line--maybe this is the best option)

bmeck commented

@littledan

  • verbosity/synchronization issues if the specifier ever changes the destination format (via import map rewriting or via updated resource to new format [JS module => CSS/HTML module etc.]), can be compounded by re-export focused modules export * from 'X'; export * from 'Y';
  • auditing is centrally located (this would require manifesting things for every import though)
  • saves parsing time/verification per location (node policies already have 100ms+ startup just parsing all the pieces of deduplicated SRIs on a small/medium CLI app on a modern laptop)
  • moves away from programmer vigilance of doing it all possible locations and prevents mismatches like import 'x' type: 'json' in one source text and then import 'x'; in a different source text.

Well, this "why not both" option is interesting. Does anyone want to take up designing this external resource file, and we can see if we could keep them in correspondence as parallel efforts? We will probably need such a file for SRI, and I don't see any particular downside to allowing the module type either there or inline. However, I do think it would be harmful for ergonomics to require that module type be declared out of line.

I think it also somewhat solves 17, 9, and 7, since different environments can just load different out of band information, and for the web at least, that out of band information can also secure css @import and such.

bmeck commented

@littledan can you clarify that ergonomics concern a bit? Is specifying the type at each import location not unergonomic as well?

Specifying the type at the import site is not perfect, but it's better than having to manually create a separate file to put the type in. I only see those files as viable to be created by tools. (For cases where people use tooling to generate the type declarations, there isn't really an ergonomics issue either way.)

bmeck commented

@littledan the import location would need to determine the type expected for completion already which ~= using the response MIME if it doesn't want some human intervention to decide and that could be bad as it encourages blind acceptance. Also, for any runtime APIs like React.lazy or Worker, it would require extra care and completions might be non-trivial to implement. I don't think the ergonomics is actually fixed by completions, it is certainly aided though in that it allows easier filling of boilerplate.

To clarify, I'd propose that the semantics on the Web would be that the module graph will fail to load if any MIME type doesn't match what is declared. I personally hope other environments take analogous approaches (#10).

I agree that completions are not 100% mitigations vs the previous plan to not have to declare anything at all, but it looks to me like the best option on the table.

I think it's important to note that so far, all the examples of fields we want have been of things inherent to the file being imported, not the import itself.

also given that it's impossible to statically figure out what worker constructors and import() are loading, it may not be possible to verify all the import locations.

bmeck commented

@bmeck that still seems inherent to the polyfill, you'd want it hoisted everywhere. unless you wanted to polyfill the polyfill, in which case you should probably change your entry point rather than trying to carefully hoist things in the magic order.

some webpack bundlers might prefer data inline, like specifying the dimensions of an imported image.

bmeck commented

@devsnek for other things doing async operations in general hoisting would be beneficial when possible. I'd argue that hoist isn't necessarily inherent to the dependency. As it is stating that the dependency can eagerly evaluate prior to the current module, it does not state that it is always safe to do so, eg:

import 'hoist-add-globalThis' with hoist: true;
import 'module-using-globalThis';
await null; // causes sibling deps to go in parallel

In this case, TLA chose to allow siblings to evaluate in parallel but we need to hoist the globalThis due to how these siblings are defined. Without hoist you would potentially have a race here.

Note that TLA will preserve the order of when siblings launch. So hoisting should be unnecessary in that particular case as long as the polyfill module doesn't use TLA itself.

bmeck commented

@littledan then assume it does use TLA ;p

@devsnek

also given that it's impossible to statically figure out what worker constructors and import() are loading, it may not be possible to verify all the import locations.

The idea would be that this verification happens reliably during runtime, so I think it should work just fine for dynamic import.

xtuc commented

I think that tooling doing transpiling could take advantage of the inline syntax, discussed here #22, for many code generation tuning purposes.

The "why not both" is interesting. I can easily image a tool that could generate the inline syntax from the out-of-band file or the opposite for the developer.
I'm wondering how we would include this file on the web? As a HTML script tag with a new attribute? or could it be part of CSP headers?

bmeck commented

I think that tooling doing transpiling could take advantage of the inline syntax, discussed here #22, for many code generation tuning purposes.

Yes and no, it somewhat depends on the undefined behavior of what happens if attributes collide. If a tool generates something like a pure attribute to avoid side effects on load in one file, but not in another inline may actually prove difficult since it requires keeping many locations in sync potentially.

I'm wondering how we would include this file on the web? As a HTML script tag with a new attribute? or could it be part of CSP headers?

That seems up to web specs, not TC39. I can imagine some hosts may not even wish to allow out of band attributes (or vice versa).

xtuc commented

If some host only implement the inline syntax or the out-of-band format it might endup being more confusing to the developer, especially if they are getting out-of-sync.

That seems up to web specs, not TC39. I can imagine some hosts may not even wish to allow out of band attributes (or vice versa).

Yes, sure. It was just for my information and if we had some ideas already.

bmeck commented

If some host only implement the inline syntax or the out-of-band format it might endup being more confusing to the developer, especially if they are getting out-of-sync.

I'm not sure I understand this, things are already showing they intend to get out of sync with things like Node, Deno, and Web loading different types of modules and not matching. What is the thing we are wishing to preserve in light of already diverging scenarios?

@bmeck

That seems up to web specs, not TC39. I can imagine some hosts may not even wish to allow out of band attributes (or vice versa).

In this repository, I am specifically encouraging us to all discuss how this feature will relate to particular host environments, even though we probably won't make all the final decisions here. If we just talk about environments in the abstract, I think it will be too confusing and vague.

@bmeck

What is the thing we are wishing to preserve in light of already diverging scenarios?

Personally, I hope we can build as much compatibility across environments as possible, as we're discussing in #10 . I'm not sure how we could already conclude that things are diverging.

bmeck commented

@littledan Deno is loading typescript, Node is loading things with user land loaders, web is loading HTML/CSS, etc. The module pipelines and formats being loaded differ significantly and I don't see how they are intended to be framed as on a common path. Even if we have some level of compatibility it shouldn't be framed as being something that cannot be broken nor as affecting location of attributes being able to be kept consistent across all the environments. I just don't understand how/what is being able to be kept in sync. The origins of the proposal were a security concern by some of the web and the model the web encourages for loading, that does not universally apply to all environments either.

Can I both agree with #13 (comment) and hope that we can find a useful, interoperable subset among host environments that are interested?

bmeck commented

@littledan i'm wary of a recommended subset as that will be leverage that environments will use to impose things like their security model affecting other environments which may have differences that see no need for, gained ergonomics by enforcing them, and even might see problems implementing any attributes.

Following more offline discussion that @bmeck and I had, I'm convinced that this proposal should not recommend any sort of common, interoperable subset for hosts. I'm happy to leave the determination of semantics to the hosts, who can each find meanings that meet their goals. Some hosts may want to seem similar to each other, but that's well outside the scope of this proposal. Thanks for taking the time to explain the context to me, @bmeck .

xtuc commented

That make sense to me. Hosts will likely tend to look similiar, so that developers can rely on the same assumptions.

syg commented

This in-band or out-of-band or no por que los dos issue is the most important point to figure out to me. My own opinion isn't well-formed yet but I have heard more arguments against in-band than out-of-band.

Arguments against in-band:

  • Encourages in-source embedder-specific information.
  • Decentralized, which presents both auditing difficulties and asks the question "why is metadata per-import site"?
  • Metadata around assets conceptually extend beyond just imports.

Arguments against out-of-band:

  • Bad DX.
  • Even more nebulous without a clear path forward.
  • What am I missing?

If bad DX is the main con for OOB, and tooling is being touted as a fix for ergonomics concerns for in-band, I'm not understanding why tooling is also not an answer for an OOB solution. As for having no clear path forward, that's more readily remedied if we're all aligned.

Arguments against in-band:

Encourages in-source embedder-specific information.

I personally think this is an argument for in band. The author is the one that has the expectations of what this kind of import must mean, and the semantics are defined by the host, so host-specific information is exactly what's needed. A separate system that's host specific won't work if the host has different semantics for the import that what the author was expecting.

I'm not understanding why tooling is also not an answer for an OOB solution

Tooling may not be convenient or available. Consider transitive dependencies being pulled from a CDN like unpkg. Does the consumer of those modules have to run a tool that traverses the entire module graph and generates some manifest? That would severely impact the usability of this feature. We want authors to be able to reliably and safely publish modules that use CSS/HTML imports without forcing downstream consumers to jump through onerous hoops.

Edit: To add about OOB being onerous. Consider that you can import a module from unpkg directly in devtools now. We'd like those modules to be able to use CSS imports. Now what does the local developer have to do to be able to make this work when a transitive import uses CSS?

syg commented

Tooling may not be convenient or available. Consider transitive dependencies being pulled from a CDN like unpkg. Does the consumer of those modules have to run a tool that traverses the entire module graph and generates some manifest?

To make sure I understand: the reason that the CDN doesn't also serve the manifest because that's the whole security model we're trying to address, that an untrusted server tells us how to interpret the payload?

Does the consumer of those modules have to run a tool that traverses the entire module graph and generates some manifest?

If you had to comb dependencies for your out-of-band data you'd also have to comb them for your in-band data. The difference is where you store your resulting information, not how you acquired it in the first place.

@syg in either case, you'd probably have your metadata served by your own self, the difference is if that data is served by your own self in your own code, or served by your own self in a separate file from your own server (or in metadata in your html, or whatever, tbd)

To make sure I understand: the reason that the CDN doesn't also serve the manifest because that's the whole security model we're trying to address, that an untrusted server tells us how to interpret the payload?

Well, the CDN doesn't serve the manifest initially because it hasn't implemented that. CDNs, playgrounds, etc., would have to be updated to generate manifests. Right now the in-band solution would just work with unpkg, stackblitz, etc.

Even once they could generate a manifest, they'd presumably have to do it from in-band metadata (or some package-local OOB file) in the first place. Then who specifies that format?

bmeck commented

If transitive dependencies are brought into this line of thinking, aren't we also needing all the transitive deps to add these attributes? If the CDNs are possibly some of these untrusted origins wouldn't the trusted origin still need to list all the allowed importing attributes? e.g.

If my trusted origin uses CDN A's /a which loads CDN B's /b. Wouldn't the trusted origin need to trust CDN A to load CDN B's /b correctly? This seems like we still need the entire trust chain at the origin of these import graphs if the feature is based around preventing untrusted sources from loading things improperly.

I'm getting a bit confused here as to why this idea of trust for the imported resource format/side-effects only applies at the top level and not within the CDN itself if you are importing JS from the CDN.

This is exactly why integrity, preload metadata, and import maps resolution all need to be generated at the same time by the same tool.

If transitive dependencies are brought into this line of thinking, aren't we also needing all the transitive deps to add these attributes?

Well, yes, if they need them. Anyone that imports CSS would need to add the import attributes to allow that. In-line attributes just work in this case. OOB because much more difficult because you need a new tool to generate them and that tool has to have either global pre-knowledge of the module graph, or we have to allow for the OOB information to be updated in the presence of dynamic import(), new script tags, etc.

bmeck commented

@justinfagnani I'm stating that the security feature being claimed here doesn't make sense in transitive dependencies and is making the entire idea questionable if in-band is used in untrusted origins.

syg commented

Well I thought I maybe understood the threat model but wouldn't you know it, I actually don't.

@bmeck I'm very confused by this line of reasoning. I don't see why the type declaration would need to be transitive. We're not talking about the privilege to execute something of a certain type, but rather the importer giving its expectations of how its direct dependency is to be processed. (I don't see how this all relates to in-band vs out of band, though.)

@littledan for example, importing type:html that then imports js which infects my system or steals my cat pics or something. if the data is out of band, i can specify the rules for the imports of the imported type:html file and prevent that.

@devsnek This sounds like a really different threat model from the one presented in WICG/webcomponents#839 . My understanding is that, if you do import something that's executable, you are granting it trust to appropriately import other things. Presumably the manifest won't describe all the edges in the entire module graph, right?

@littledan that issue seems to have put html/css/json in the same category, which is why i brought it up. anne mentioned that html might be different, but even if it isn't, i think we should plan for more complex graphs.

@devsnek I think the difference had to do with the debate about whether HTML will or will not be able to execute JS code. The other issue is the different parsers. These are both shallow properties.

If you see an additional security issue to the one raised in WICG/webcomponents#839, that could be interesting to discuss, but I don't think it'd been presented in that thread, and I don't understand it from what people have been saying in this one.

@bmeck I think we're misunderstanding each other at this point. I'm not claiming that any in-band type attribute should apply transitively, indeed those should only apply locally to the annotated import. But such annotated imports should work anywhere in the module graph without knowledge or opt-in by the top-level module. ie, a transitively imported module with a properly annotated CSS import should just work.

That reasoning would suggest that import maps should be specificable inline too. Why was an out of band mechanism used there?

Import maps answer an entirely different question from module attributes. A type: css attribute says that the importer expects the module to be CSS. This is a local assertion, and can't actually be a global one: no one else knows what the local importer was expecting.

import 'x' only says to import some abstract name x. The whole purpose of the bare specifier is to abstract over the real location and to leave the resolution up to something else. If the local importer already knows exactly which module to expect, then they can just use a URL. IOW, there is no such thing as an inline import map, because at that point you wouldn't need to map imports at all.

import 'x' only says to import some abstract name x. The whole purpose of the bare specifier is to abstract over the real location and to leave the resolution up to something else. If the local importer already knows exactly which module to expect, then they can just use a URL. IOW, there is no such thing as an inline import map, because at that point you wouldn't need to map imports at all.

But isn't this the point of of import './any-specifier.ext';, it abstracts away the content of the module leaving resolution up to the host environment.

For example most people would not want to be writing import minimist from 'minimist' with type: commonjs;, the fact it's commonjs is a detail and resolved by the host.

The same is true for some (albeit not all) uses of importing JSON. For example you might be wanting to import some config but not caring how it's constructed, the fact some configs are simple enough to be used with JSON is a detail.

This is especially true of dynamic import (less so of static). If you need everything that wants to be imported to also carry around some arbitrary metadata telling it's kind, even though the consumer doesn't need to know or care.

For example most people would not want to be writing import minimist from 'minimist' with type: commonjs;, the fact it's commonjs is a detail and resolved by the host.

I'm really confused by this. Being CommonJS is very much not an implementation detail because the semantics are quite different from the default import type of standard JS modules.

If you wanted to support importing CommonJS this seems like an incredibly reasonable way to do it. Normally a CommonJS file would at least fail produce any exports, and likely fail to eval due to missing require, module, exports, etc., definitions. Telling the host to provide them and parse the module differently seems like it'd let the host set up the environment for that import correctly. A browser, which wouldn't support that module type, would then fail with a reasonable error message.

Being CommonJS is very much not an implementation detail because the semantics are quite different from the default import type of standard JS modules.

This might be a fundamental disagreement but I think being CommonJS is strongly an implementation detail because I only want the function that I want. Whether it's authored with CommonJS, ESM or even WASM makes zero difference to me as the consumer of the module, from minimist I just want a function that parses some arguments, I don't care how it's implemented.

It also makes package upgrades require needless changes when otherwise no observable changes would be made. For example consider a simple math library:

import { add, sub } from 'math-package';

If math-package changes to use WASM for performance but exports an identical interface under this proposal I would need to add with type: webassembly on my end. But this is beyond pointless for me as the consumer given I don't care how add/sub are created, the with type: webassembly is just useless cruft that doesn't affect how I'm consuming the module in any way.

Honestly I think the most likely outcome of this proposal for package authors would be they'll wind up adding a redundant module in between that just exports the other type, this doesn't tangibly benefit the package author or consumer so it's not clear why we'd want this.

e.g.:

// main entry into package
export * from './math.wasm' with type: webassembly;

Did I miss the part where somebody suggested that the (proposed) type: metadata should be used to differentiate different JS module formats? I thought the discussion centered around type: css, type: html, etc.

bmeck commented

@thw0rted none of this current proposition disambiguates/differentiates formats, it just is an assertion of what is an allowable response type.

I understand that this remains a bit controversial, but I think we've articulated some strong reasons for putting at least some kinds of metadata in-band. Some in-band metadata here does not at all preclude out-of-band metadata from being added by some other proposal, whether in JS itself or host-specific.

This proposal remains based on adding in-band metadata. It's one of the core questions that I'd like to draw a conclusion on in proposing module attributes for Stage 2: consensus on Stage 2 would be settling on the tradeoff that we should add some in-band metadata.

I still believe that out of band, or the specifier, is a better choice than inline syntax. The module author, not the module consumer, should be determining the format of a module.

iow, in-band metadata on the consumer side is fine for anything the consumer is the arbiter of - but the module’s β€œtype” isn’t one of those things.

I really can't see inline as anything except as a confusing burden. Take #58 for example, a seemingly reasonable request is probably impossible because the caching semantics of inline attributes are a mess and anyone trying to use the data will probably end up with odd bugs.

Regarding this:

I still believe that out of band, or the specifier, is a better choice than inline syntax. The module author, not the module consumer, should be determining the format of a module.

iow, in-band metadata on the consumer side is fine for anything the consumer is the arbiter of - but the module’s β€œtype” isn’t one of those things.

I don't quite see it as making the consumer the arbiter of the type.

Even with the type attribute, the provider of the module can still make the final determination of the module's format. E.g. in the web the module is sent with a MIME type that the type attribute can't override. At most it will just make the module not load if it turns out to be a format that the importer didn't expect, but it will never change the format.

There are a couple issue threads (#43, #56) discussing ideas that would give the importer the ability to actually change the format of the module, but both of those are outside the scope of the core proposal.

@dandclark the other thing is that it forces the consumer to know the module's type, and i don't think that's knowledge they should have to have (nor something that should constrain the author's ability to transparently refactor a module between types).

The requirements to avoid the privilege escalation attack is that the consumer of the module declares with what privileges (executable vs non-executable) the module is loaded with. That requirement exists whether it's stated in-band or out-of-band.

Fair, but "type" doesn't mean "privileges". If the requirements are to designate "executable", why is more than a boolean, or an explicit privilege, needed?

We're going in circles a bit; @syg proposed this in WICG/webcomponents#839 (comment) . I got the sense from that thread that many people preferred including the type. I'd be fine with switching to something like this within module attributes syntax; I think we could consider this change within Stage 2.

xtuc commented

We reached consensus for stage 2 with the inline form. Are we happy closing this issue?