WICG/webcomponents

HTML, CSS, and JSON modules shouldn't solely rely on MIME type to change parsing behavior

rniwa opened this issue Β· 107 comments

rniwa commented

As we discussed in TPAC 2019 Web Components session, the current proposal / spec of HTML, CSS, and JSON modules do not specify the type of content in the import statement.

This is problematic because an import statement that intended to load CSS or JSON and not execute arbitrary scripts could end up executing scripts if the destination server's MIME type got changed or the destination server get compromised.

In general, we've made so that the importer of any content can specify how the imported content should be parsed & processed. This is one of the motivations for adding CORS fetch for JSON as opposed to JSONP for example.

I think there's two separate parts to this:

  • Treating some response as another/specific format
  • Preventing evaluation of unexpected formats

I personally think the second one should be part of Content-Security-Policy rather than changing anything about the current module loading e.g. Content-Security-Policy: modules self *, https://some.config.tld json, https://fonts.somewhere.tld css.

For the first one maybe we could extend the import: protocol to have import+css:/import+html/etc to force it into a specific format.

This is problematic because an import statement that intended to load CSS or JSON and not execute arbitrary scripts could end up executing scripts if the destination server's MIME type got changed or the destination server get compromised.

I believe this functionality is required in polyfilling, to support browsers not yet supports HTML/JSON/CSS modules, a server can just respond the corresponding JavaScript file that wrapping original content to default export, which could provide a consistent code style at consumer-side.

rniwa commented

This is problematic because an import statement that intended to load CSS or JSON and not execute arbitrary scripts could end up executing scripts if the destination server's MIME type got changed or the destination server get compromised.

I believe this functionality is required in polyfilling, to support browsers not yet supports HTML/JSON/CSS modules, a server can just respond the corresponding JavaScript file that wrapping original content to default export, which could provide a consistent code style at consumer-side.

Weakening the security model for the sake of polyfilling is an unacceptable trade off in our view.

Imports are inherently dangerous, and this is why CSP allows for restricting the origin for scripts. That should apply to all imports, regardless of type.

Importing JSON should only be done for trusted sources, and shouldn't in general be done for calling third party APIs (and arguably shouldn't be done for 1st party APIs if you don't want your module graph to fail to load if the API call fails). fetch() is much more appropriate for that.

Regarding server-side polyfills, this should still work if the client sends the appropriate Accept header.

rniwa commented

Imports as inherently dangerous, and this is why CSP allows for restricting the origin for scripts. That should apply to all imports, regardless of type.

Given that many websites don't use CSP correctly, relying on websites to correctly deploy CSP to get the right security behavior is not a great plan.

Furthermore, today, if you were to fetch JSON via XHR or fetch API and parse it via JSON.parse, there is no chance of the fetched JSON suddenly executing as scripts. This new module loading mechanism, therefore, is a functional regression from existing loading mechanisms.

We believe this security issue is a show stopper issue for HTML, CSS, and JSON modules.

Yeah, Mozilla does as well. Importing non-scripts should be safe by default. (If HTML modules end up executing script they might not necessarily be problematic.)

What is the counter proposal?

I think @justinfagnani's point here is interesting and I want to second it. Even aside from security concerns, JSON imports seem like the wrong tool for consuming non-first-party JSON/CSS. If the request 404's or if, say, the JSON has a parse error, the entire module graph will fail to instantiate/execute. fetch() is the right tool in this scenario, with import being reserved for content that is directly controlled by the importer. Otherwise all your module scripts could fail to run because of a missing { in third-party JSON...

That doesn't apply to import() afaik.

If the request 404's or if, say, the JSON has a parse error, the entire module graph will fail to instantiate/execute.

This might actually be what you want in some cases, e.g. if a critical resource does fail to load then you don't want to waste resources evaluating a bunch of modules you don't need. You can always use import() to catch errors in that particularly and do some recovery code or retry.

rniwa commented

I think @justinfagnani's point here is interesting and I want to second it. Even aside from security concerns, JSON imports seem like the wrong tool for consuming non-first-party JSON/CSS.

Regardless of whether it's a good idea or not, some web developers are inevitably going to do it. We shouldn't be adding a new foot gun to the Web so that authors can avoid using it in certain situations to avoid creating a new security surface.

I can only second @rniwa here, @justinfagnani's point included a lot of "should/shouldn't", and we all know how that goes. Importing non-scripts must be safe by default.

Is it OK for WebAssembly modules to have parsing behavior based on their MIME type? That's what the current proposal does.

HTML, CSS, and JSON modules do not specify the type of content in the import statement.

Are there any proposals / ideas that could solve these issues?

At the risk of suggesting something naive, could the file extension be the way that the import statement specifies its expected module type?

 // both the file extension and mime type must be present for the non-script module to load
import 'file.json'
import 'file.css'
import 'file.html

The obvious limitation of this approach is that you can't load json, css, and html modules from urls without the proper file extension. The advantage is that it doesn't require a rework of import statements and import maps to make html, css, and json modules work.

Apart from plugins there's no precedent for putting meaning in file extensions within the web, I don't think we should start here.

rniwa commented

Is it OK for WebAssembly modules to have parsing behavior based on their MIME type? That's what the current proposal does.

That’s not ideal but less problematic because the expectation of loading WASM & JS are similar: they execute arbitrary scripts. It’s not so with CSS & JSON.

If we had syntax in JavaScript for asserting the MIME type (mandatory for JSON modules, and optional for JavaScript), would that address this concern? If so, we can look into this issue in TC39; I don't think we have considered it before. As a strawperson, it could look like this:

import document from "./foo.json" with mime: "application/json";

This could be a basis for adding sending metadata to the host environment (HTML, Node.js, etc). Thoughts? Did people have other specific ideas for what this should look like?

Nothing concrete was discussed, but yeah, something like that would address the concern. And equivalent for import(). Instead of the MIME type it might be nicer to assert it being "json" or "css" or some such.

The author is the only true arbiter of the parse goal of content; would this be an assertion, or would it change the parsing like script type does, for example?

@ljharb I imagine we would continue to use MIME type in conjunction with the declaration within JS. That's why I used the word "assertion".

how about the CSP idea but the default is disallow instead of allow? I think the layering of extending import syntax is a bit iffy, since it adds host-specific semantics to an otherwise generic bit of code, and then random hosts have to be like "wait what do i do with this"

It doesn't really seem host-specific to be able to import JSON without that resulting in script execution later on.

as an example, node requires you to write .json in the import specifier. we would have no use for such a syntactic extension. jumping a bit further, if someone writes import x as 'application/json', but a host doesn't use mine types, what do they do with that?

Right, somehow, this is syntactically redundant in environments where the interpretation is implied by the module specifier's suffix already. The web doesn't have a tradition of making such judgements.

I suppose the analogous (and unprecedented?) thing here would be something like requiring that, if the MIME type is application/json, then the module specifier must end in .json, and prohibiting JS modules with this suffix. But such a scheme faces web compatibility issues with growing over time.

If we require this syntax for JSON modules on the web, I think there is some chance that a common authoring format will omit the assertions, and tools will insert it when generating web output as part of a build process. But there is also a chance that we can convince most people to write this directly.

sorry, to be more direct with what i'm trying to say: given security is host specific, shouldn't this assertion be out-of-band from the import? at worst you would have hosts ignoring a check people expected would be enforced.

Inferring meaning from a file extension is incompatible with the web's architecture. Requiring out-of-band annotations seems like it would create such bad ergonomics the feature would effectively not be used on the web.

I'm not necessarily suggesting using file extensions. Also, a lot of things on the web are out of band (csp in headers, import mapping in a html tag, etc). I'm not sure why this specifically would be worse. If you're worried about people not using it, you can default to disallow, as I keep saying.

I was assuming it was a given that it defaults to disallow, but that also makes it a pain to use the feature (having to set up CSP correctly) compared to fetching and parsing as JSON.

sure, but turning loading a module into a support matrix is a pain for everyone else. In node we've done everything out of band from import because of this, so I am hopeful the web can do the same.

Out of curiosity, were those at the TPAC 2019 General Session wanting to work through the security issues to see HTML, CSS, and JSON modules implemented?

Or was the feeling more that this was a showstopper that completely blocks those module specifications from going forward?

The responses in this github thread seem to shut down proposed solutions, without proposing an alternative solution. Which makes me think maybe those at the TPAC 2019 session are leaning towards shutting down these specifications entirely?

rniwa commented

Or was the feeling more that this was a showstopper that completely blocks those module specifications from going forward?

For WebKit, this is an absolute show stopper. We don't think we should ever implement the support for HTML, CSS, or JSON module without this issue being addressed adequately.

The responses in this github thread seem to shut down proposed solutions, without proposing an alternative solution. Which makes me think maybe those at the TPAC 2019 session are leaning towards shutting down these specifications entirely?

There are many standards proposals that are shutdown due to privacy, security, and other considerations. It's usually the burden of those who are seeking to propose or add a new standards to come up with an appropriate solution, not of those who pointed out issues with a proposal.

So if extensions and out-of-band solutions are out, I think that leaves us with two basic approaches:

  • Non-extension, in-specifier syntax.
  • JavaScript syntax

Does this sound right?

I heard that in-specifier syntax might have been discussed already. Can anyone shed some light on that? Was it something like

import styles from 'import+css:./styles.css';

For JavaScript syntax options, I presume this interacts with other requests over time, like SRI hashes for imports. Has there been a proposal for generic metadata options to be added to import directives?

@littledan

If we require this syntax for JSON modules on the web, I think there is some chance that a common authoring format will omit the assertions, and tools will insert it when generating web output as part of a build process. But there is also a chance that we can convince most people to write this directly.

Build steps happen at multiple points in the edit-to-production pipeline. A build step could insert these assertions to libraries before publishing to npm, for instance, which would be pretty invisible to package consumers.

For JavaScript syntax options, I presume this interacts with other requests over time, like SRI hashes for imports. Has there been a proposal for generic metadata options to be added to import directives?

Not that I know of. If someone is interested in championing this in TC39, I would be very happy to help them through the process.

Build steps happen at multiple points in the edit-to-production pipeline. A build step could insert these assertions to libraries before publishing to npm, for instance, which would be pretty invisible to package consumers.

No disagreement with you there. I don't have any particular value judgements on this potential end state.

I think the options are:

  1. A different identifier.
  2. Dedicated JavaScript syntax.
  3. Mandatory for non-JavaScript/Wasm modules out-of-band annotation (via CSP or Import Maps or some such). (If HTML ends up with the ability to execute script we might not have to require an annotation there either.)

Does the full mime type really need to be specified? It would nice to provide notation that also could, in the future, support specifying inline data within a js file, just as html can do with js / css references.

I liked the idea proposed here:

import styles from './styles.css' as StyleSheet;

Later, once this is implemented, a follow-up proposal would be to allow inline text, which the browser could optimize around, knowing what format to expect:

import styleSheet from string `
div { color: green; }
` as StyleSheet;

Or was that proposal found problematic?

@annevk @rniwa was encoding the module type in the scheme discussed at TPAC? Is there a critical flaw in that approach that eliminates the option?

@justinfagnani that sounds like 1 above (a different identifier). See also #839 (comment). To reiterate, the security concern was raised in the meeting, there's no concrete alternative proposed, but as mentioned above there's a couple of potential paths.

Just to clarify, in TC39, we usually use the term "module specifier" for what I think @annevk is calling "identifier" in this thread.

FWIW, we are also having this same discussion about "import as type" in bundler land right now, specifically in Parcel. We don't have the same issues with security being discussed here, but do have a related issue where imported files could have multiple compiled representations that a user would like to choose from. See parcel-bundler/parcel#3477. It would be nice to land somewhere standard syntax-wise across both bundlers and web browsers.

Our current thinking, without changing JavaScript syntax, is to add something in the module specifier (likely a protocol) to signal the type to import. I see that this is also being proposed here. I would also be happy with an extension to the import syntax for this, but I'd imagine that would be harder to get standardized (especially the list of supported module types).

About new "javascript syntax" for the module specifier, we had many discussions at tc39 about the module specifier, first for importer metadata, then for builtin modules, and few more during the last 5 years or so. Don't keep high hopes on changing that, the feedback has been consistent: why does ES need to know about this? isn't the flexibility on the module specifier sufficient?

Well, there's less flexibility now hosts have decided on a default behavior. And the option of using URL schemes for this seems rather ungainly.

I plan to raise this at TC39 in the December meeting, and hope to have a draft to share with this group some time in November. I agree that adding more syntax into the module specifier would be unfortunate, given the complexity of URLs as is.

I’m not sure how that’s truly avoidable though, given the web’s insistence that the consumer (instead of the author) should dictate the parse goal/format.

I’m not sure how that’s truly avoidable though, given the web’s insistence that the consumer (instead of the author) should dictate the parse goal/format.

β€œInsistence” is a pretty judgmental word. The Web also insists that the consumer knows the URL of the resource that the consumer is trying to load, as well as the exported symbols (i.e. all the non-keywords in (import { shuffle } from 'https://foo.com/utils.js';). Knowing the type as well isn’t really such a huge imposition.

then it seems like putting it in the specifier wouldn’t be such a huge imposition either.

HTML, CSS, and JSON all have common usage extensions in IANA, could we just use those explicitly? e.g. importing JSON has to use .json and importing HTML has to use .html or .htm, and then the browser fails the resolve if the types don't match up.

Just to provide a use case for JSON in particular not having a .json extension would be REST APIs. I imagine there would be a good use case for fetching JSON in SPAs.

I’m not sure how that’s truly avoidable though, given the web’s insistence that the consumer (instead of the author) should dictate the parse goal/format.

I believe the main issue described here is that there is often a middle layer here. Where a request may be served by a third party. Both the consumer and the author should be able to specify which parse goal they intend.

FWIW as an outside observer, a URL scheme seems appropriate to me. It's very similar to tooling such as webpack using json-loader!./file.json.

From upthread:

Inferring meaning from a file extension is incompatible with the web's architecture.

just to clarify, i'm saying to use it as an assertion, not as the determinant. in any case i think loading data from apis is motivating enough to throw the idea out.

Hmm given the security issues here, do we even really need this added to the spec? Importing CSS, JSON, HTML, or any other file type seems pretty simple to do with top-level await and dynamic imports fetch(), or I am oversimplifying? πŸ˜„

Sorry I meant to say fetch() in my last response--not dynamic imports--since thats basically what this issue is talking about.

People seem to like JSON and CSS modules in tools, and lots of frameworks have concepts of components in modules.
Lots of module loaders have a concept of JSON modules. If we can work things out here, we could further unify JS/web development and ensure the same construct works across environments and tools. I think this would be valuable for developers, but I'd be interested to hear if you feel otherwise.

Yeah, I think unification is awesome! I guess the only difference with what module loader libs do and this feature is that I've always used module loaders to load first-party files, not third-party files (although some may allow this). Is it worth exploring possibly scoping this effort to just first-party files that are relative to the root of a domain, or same domain, or same-origin (not sure about the right terminology here πŸ˜‚)? I think when we try to incorporate dealing with third-party files, that's when security becomes more of an issue right?

It would be great if, like CORS/ajax, importing JSON from the same domain Just Worked, and only different-domain URLs required an explicit/separate allow list.

@mkay581 importing CSS is extremely common in tooling-based solutions. The DX is an obvious win, and the loading perf win is worth it too.

We can't quite do the same with fetch() and top-level-await either because we'd need to fetch as a string, then parse async, and await on both steps, which is going to block dependent modules from even starting their fetch. There will be a waterfall of work up the tree that won't exist with native CSS modules.

syg commented

How do feel folks about solving the narrower problem of importing things that shouldn't execute instead of the problem of importing things of different formats? That's the crux of the security concern as I understand it, not the need to support different formats.

I'm with @annevk that there's nothing host-specific about wanting JSON to not execute script. I like the idea of passing metadata via import to the host, but ISTM an NX bit should be directly conveyed.

To build on @littledan's earlier strawman syntax:

import doc from "./foo.json" with noexecute;

interesting... I wonder how that would play with node, where our synthetic module callbacks are written in js. does noexecute just disallow the resolved module from being a stm?

syg commented

I was imagining that the mime type be used as it does in the current JSON module proposal. If the mime type is instead something that causes code execution on the host, e.g. application/javascript, then an error is thrown.

Edit: I haven't thought very deeply about the timing of that error, which may be difficult. Not sure if surfacing it translates to disallowing the resolved module from being a STM.

@syg why do you think the no-execute bit should be directly conveyed? I suspect many developers will not really follow all of the security arguments that this thread opened with. By contrast, redundantly writing the type is intuitively meaningful (if annoying).

What about the suggestion from @ljharb ? Does following an approach similar to CORS change the security implications? It might be a nice v1 we can go with.

FWIW, I kinda like the "no execute" / "no side effects" statement.

@LarsDenBakker even if we make exceptions for same-origin (which I'd rather not), we need something that addresses cross-origin scenarios. Not all cross-origin is necessarily third-party either. CDNs for assets only are still a thing.

If we can get a change through to allow the with noexecute syntax, I wonder if directly supporting a host-defined type, like as json, might just be preferable. Then you can choose the representation in cases where multiple would apply. I've seen some ideas for as bytes before, which allows userland building of arbitrary objects, and could be applied to any kind of file.

ie:

import styles from './styles.css' as css;
import rawStyles from './styles.css' as bytes;
syg commented

why do you think the no-execute bit should be directly conveyed? I suspect many developers will not really follow all of the security arguments that this thread opened with. By contrast, redundantly writing the type is intuitively meaningful (if annoying).

@littledan Because it's the most direct expression of intent? I should like to think developers are very familiar with the concept of an executable permission bit, even if it's never come up in this context before. They don't need to follow the mechanics of the security concerns earlier, so long as they find it intuitive to understand that importing assets shouldn't run code. In the writing-out-the-format world, they'd still need to understand a reason that that redundancy is necessary, right?

Practically I see executable permission being orthogonal to formats. And besides, the design of a format thing is much more open ended to me (is it all a bunch of host hooks?), and might take much longer than directly addressing the security concern here.

@justinfagnani I'm not sure what host-defined type means in that case, but AFAIU that's solving a different (and harder, larger-scoped imo) problem of multiple representations and then hanging off those representations the "can execute" bit.

Don't get me wrong, I'm totally open to solving the multiple representations problem. What I'm missing is the desire to lump the "can execute" problem together with it.

rniwa commented

Execution vs. no execution is an important distinction but changing the parser mode based on MIME type isn't great either. We've definitely had security attacks which used an existing content by reinterpreting it in a different text encoding & MIME type.

@rniwa If you're concerned about the wrong parser being applied, and not just verifying permission to execute code, do you prefer an annotation that distinguishes WebAssembly from JS modules then?

rniwa commented

@rniwa If you're concerned about the wrong parser being applied, and not just verifying permission to execute code, do you prefer an annotation that distinguishes WebAssembly from JS modules then?

That would have been preferable.

I'm curious -- is all forward progress on non-JS modules going to cease until syntax is finalized?

As far as JSON imports, I wonder if it is out of the question to support the reviver parameter? If not, this might suggest a syntax (specific for JSON):

import doc from "./foo.json" with no reviver;
import doc from "./foo.json" with reviver (key, value) =>
  typeof value === 'number'
    ? value * 2 // return value * 2 for numbers
    : value     // return everything else unchanged
);

@bahrus Wouldn’t a β€˜function argument’ like that be incompatible with the static import/export model?The import occurs after parsing of but before evaluation of the importer.

I'm not sure, @bathos. But it's possible the same arguments used here might apply (or maybe not).

The argument there that it’s not necessary because it can already be done with other API seems applicable. The bar is pretty high for unique syntax and eval rules. The JSON.parse β€˜reviver’ argument is just a mapping function β€” what’s special about it?

Presumably the reviver function was added to JSON.parse, because it allows for a more efficient way of doing things like converting date strings to dates while parsing, rather than having to do so on a second pass.

The argument may not have carried as much weight in the case of fetch, since response.text() was (and is) available.

However, since importing JSON as text isn't an option (maybe it should be?), the reviver option seems more applicable.

Or maybe importing as text should be an option? So now we are back to supporting both:

import doc from "./foo.json" as string;

and

import doc from "./foo.json" as json;

, another example of what @justinfagnani was alluding to.

Both options (as string/json or with reviver) seem preferable to:

import doc from "./foo.json" with noexecute;

but not enough, in my mind to freeze progress on importing data in various forms.

it's the most direct expression of intent...
Practically I see executable permission being orthogonal to formats.
Don't get me wrong, I'm totally open to solving the multiple representations problem. What I'm missing is the desire to lump the "can execute" problem together with it.

I think this was covered upthread, but is TC39 dead set against the possibility of a new keyword? I agree that "noexecute" is the most direct expression of intent, but another way of looking at the problem is that we're overloading the meaning of import in the first place, to mean either "execute this script and give me one or more of its exports" or "parse this static, external content and give me an object representation of it". Was it a mistake to use one word for both meanings in the first place? Is it too late to consider an alternative?

In keeping with the Two Hard Problems, I'm hesitant to suggest a keyword name for the load-non-script-file operator. All my off-the-cuff ideas are pretty terrible, like get or import const. The best I could come up with was a static-keyword version of fetch that would make fetch vs fetch() work like import vs import(), but I doubt that's possible.

To elaborate further on the idea of explicit syntax for module types, a few of us are working together on a TC39 proposal to add this syntax. Feedback would be highly appreciated on https://github.com/littledan/proposal-module-attributes

doesn't failing open just leave people who forget these checks exposed? it only takes one slipup, and a malicious script could exfiltrate user data, exploit jit bugs, etc.

doesn't failing open just leave people who forget these checks exposed? it only takes one slipup, and a malicious script could exfiltrate user data, exploit jit bugs, etc.

I still think CSP is the way to go, it allows applying a policy across all code sites without accidentally missing one. Also thinking about it more this would be useful beyond modules simply for enforcing unknown content isn't being inserted into unexpected usage sites with fetch/etc.

Though to do this Content-Security-Policy would need to support some kind of glob/regexp esque matching e.g. (glob style):

Content-Security-Policy: content-type *.json application/json, *.js text/javascript, *.css text/css, *.config application/json, https://foo.bar/* application/json https://foo.bar/lib.js application/javascript

rniwa commented

See my comment at #839 (comment). CSP isn't an acceptable solution for this problem.

Given that many websites don't use CSP correctly, relying on websites to correctly deploy CSP to get the right security behavior is not a great plan.

This doesn't seem like a strong argument against CSP, this could easily be resolved by strengthening the CSP defaults and perhaps even disallowing type change in some way.

For example it could be required that all JSON modules were rejected by default unless some policy enables them (e.g. like the strawman one above).

e.g. Take this policy for example:

Content-Security-Policy: content-type https://foo.bar/*.json text/json

There's no way for foo.bar to upgrade anything of the form *.json to a script. The only way to allow it would be to change the policy to text/javascript, this seems as adequately secure as the with json approach.

Note that in such policies the type could not be wild-carded (e.g. https://foo.bar/*.json * would not be allowed, the allowed types would have to be enumerated).

rniwa commented

e.g. Take this policy for example:

Content-Security-Policy: content-type https://foo.bar/*.json text/json

There's no way for foo.bar to upgrade anything of the form *.json to a script. The only way to allow it would be to change the policy to text/javascript, this seems as adequately secure as the with json approach.

This approach is problematic because it adds more distance between where the type is declared & where it is used.

In addition, this proposal would make CSP directives affect how the served content is parsed (or not parsed), meaning that depending on CSP directives you may have on a website, the content may start executing where you were not expecting to execute. That's a really bad fit for a CSP directive, and in this regard, it's an actually worse proposal than not having any type annotation at all. CSP directives should be only used to enforce a security policy, not as a mechanism to change the way web browsers process content.

In addition, this proposal would make CSP directives affect how the served content is parsed (or not parsed), meaning that depending on CSP directives you may have on a website, the content may start executing where you were not expecting to execute.

I think there may be some confusion, I'm only proposing *.json application/json be applied as an assertion, it does not actually change the type, probably a better name would be allowed-content-types.

For example if I have the policy *.json application/json and I fetch foo.json then if its content type does not match application/json then it will be rejected by the CSP causing the attempted fetch to fail.

This doesn't change semantics of the content, just rejects it if it doesn't meet some criteria (which is exactly what CSP does for other resources).

For imports specifically a strict default could be applied that ensures that if a policy exists then the resource must actually match one of policies patterns to be accepted as a non-javascript type.*

* Ideally this would apply to all resources not just imports but it seems like a kinda large barrier to adopt JSON modules which is why I'm suggested doing it as a special stricter case.

This approach is problematic because it adds more distance between where the type is declared & where it is used.

Although file extensions don't mean anything on the web, almost all developers would expect .json to be application/json. It seems only natural that such a policy would be enforced at a higher level that developers do not need to concern themselves with on a day to day basis.

The most likely scenario is that people just add it into a build tool to automatically transform. But this seems kinda pointless because developers will in no way be thinking more about the security with that approach than with the CSP approach.

rniwa commented

In addition, this proposal would make CSP directives affect how the served content is parsed (or not parsed), meaning that depending on CSP directives you may have on a website, the content may start executing where you were not expecting to execute.

I think there may be some confusion, I'm only proposing *.json application/json be applied as an assertion, it does not actually change the type, probably a better name would be allowed-content-types.

Okay, then your proposal is insufficient.

What if someone was not even aware of CSP? Or someone had limitation on server configurations they can change? Again, most websites don't deploy CSP correctly. As a general rule, we shouldn't be introducing new features that are vulnerable by default.

This approach is problematic because it adds more distance between where the type is declared & where it is used.

Although file extensions don't mean anything on the web, almost all developers would expect .json to be application/json. It seems only natural that such a policy would be enforced at a higher level that developers do not need to concern themselves with on a day to day basis.

The idea of using file extension has been rejected multiple times: see #839 (comment) for example.

Please be mindful of others' time and read the whole thread before making a proposal that has known issues.

The most likely scenario is that people just add it into a build tool to automatically transform. But this seems kinda pointless because developers will in no way be thinking more about the security with that approach than with the CSP approach.

I don't follow. Are you saying that people would be relying on build tools to generate CSP headers. That doesn't compute because CSP headers are often set via server configurations.

This is, however, precisely why we don't want to rely on CSP. We want the content which gets served to the client to have its expected type hard-coded in its source. That's how we safe guard against random resources suddenly start executing arbitrary code.

For example, a service I created a while ago relies on JSON files being served by Apache inside a special directory. This directory is the only writable directory that's also publicly accessible in the entire service. All files in this directory are JSON and therefore are not executable (either in client or server side). If we started loading JSON files from this directory in the said service as JSON modules, it would be a strict security regression because now those JSON files could execute scripts if there were some bugs which allowed attackers to write to this directory.

What if someone was not even aware of CSP? Or someone had limitation on server configurations they can change? Again, most websites don't deploy CSP correctly

This is still something I say is kinda weak. Ideally new features should be developed in tandem with their available security policies which have strong defaults.

The fact CSP is misused is because of complexities with the policies. Things like hashes and stuff are difficult to set up and get right. I feel that this could be resolved by having stricter defaults (e.g. not allowing JSON modules by default) and safer configuration that doesn't encourage unsecure things.

Or someone had limitation on server configurations they can change?

To address this point specifically, a strict default ensures no JSON modules can be used at all so the upgrade thing isn't a concern. Instead the concern becomes allowing JSON modules in such an environment where configuration is not allowed.

The idea of using file extension has been rejected multiple times: see #839 (comment) for example.
Please be mindful of others' time and read the whole thread before making a proposal that has known issues.

I'm not saying use file extensions on the web, I'm saying people use file extensions, they will use their tools to convert import foo from './foo.json'; into import foo from './foo.json' as json (or whatever syntax) regardless.

So if they're going to do this mapping automatically anyway why not allow them to configure it in the browser for their sites rather than adding pre-processing steps.

This is what my proposal does, it doesn't make browsers understand extensions. Instead it gives a way to make assertions about expected properties of resources (just content-type above but I think more is necessary for a concrete proposal) and means that people can be confident that content received is what is expected.

Note that my proposal is wider reaching than just imports, but this because the issue that imports have is that a resource might be served with an unexpected type. This is widely applicable not just to imports but to all fetch sites.

For example, a service I created a while ago relies on JSON files being served by Apache inside a special directory. This directory is the only writable directory that's also publicly accessible in the entire service. All files in this directory are JSON and therefore are not executable (either in client or server side). If we started loading JSON files from this directory in the said service as JSON modules, it would be a strict security regression because now those JSON files could execute scripts if there were some bugs which allowed attackers to write to this directory.

My proposal still wouldn't allow this, but maybe I'm not being concrete enough to make that clear. I'll try to create a tangible algorithm that would be executed in relation to these policies such that the only way .json could be executed as a script is if the policy explicitly allowed it.


More concrete proposal:

Before JSON modules are introduced a new security policy is introduced that allows configuring both allowed content-type and destination of particular resources.

An example policy might look something like:

Content-Security-Policy:
  content-restrictions
  api/**/* content-type application/json; destination fetch,
  *.worker.js content-type text/javascript; destination worker
  *.js content-type text/javascript; destination script-like,
  */ content-type text/html; destination document,
  **/*.html text/html; destination script document,
  **/*.json content-type application/json; destination fetch script,
  https://weather.tld/api.json content-type application/json; destination fetch script

(Q?: Should module be a separate destination to script?)

Some examples of applying this policy to some files:

url response content type destination first matched policy blocked blocked reason
foo.json application/json fetch **/*.json no -
foo.json text/javascript fetch **/*.json yes Content type isn't application/json
foo.json application/json script **/*.json no -
foo.json appplication/json document **/*.json yes Destination is not fetch or script
script.js text/javascript script **/*.js no -
script.js text/javascript worker **/*.js no -
script.js text/html document **/*.js yes wrong content type and destination
https://weather.tld/api.json application/json script https://weather.tld/api.json no -
https://weather.tld/api/json text/javascript script https://weather.tld/api.json yes content type is unexpected

If a resource is not in the policy then it is rejected. This ensures the web can add new targets and resources without large concern.

Json Modules:

Now in order for this to work for JSON modules we should have a default policy that rejects all resources entering the script destination that not have content type text/javscript.

For example a web-compatible default policy might roughly be:

Content-Security-Policy:
  content-restrictions
    // Restricts script to Javascript modules only across all domains
    *:*/**/* content-type text/javascript, destination script-like
    // Anything else is allowed currently, as new features are added
    // we would want to strengthen the default policy across all domains
    *:*

(Q?: Allow wasm/html modules in such a default policy?)
(Q?: Is the status quo more strict?)

New Features:

When new features (not just JSON modules) are added that add/use new destinations/content-types(/(Q?: Other metadata?)) they could be added to a default policy.

Footnote:

Hopefully this a bit more concrete than the vague hand-wavy things from above. It's still by no means a complete proposal and has some problems of its own that would need to be worked out. However I think it solves the problems of JSON modules without introducing new syntax, preventing use cases like import foo from './foo'; (where ./foo is resolved to .wasm or .js on server side when it doesn't matter), non-redundancy with node.js, consistency with CSP and improves the complexities of CSP for both JSON modules and other use cases.


Alternative proposal:

This is similar to above but it might be worth super-ceding Content-Security-Policy with something similar to the above proposal but more consistent and unifies things more e.g. something like import maps but for security.

Strawman
<script type="security-policy">
  {
    "patterns": [
      {
        "match": ["**/*.json", "https://weather.tld/*.json"],
        "allowed-content-type": ["application/json"],
        "allowed-destination": ["fetch", "script"]
      },
      {
        "match": ["**/*.js"],
        "allowed-content-type": ["text/javascript"],
        "allowed-destination": ["script-like"],
      },
      {
        "match": ["**/*.webcomponent"],
        "allowed-content-type": ["text/html", "text/javascript"],
        "allowed-destination": ["script"]
      },
      {
        "match": ["https://www.third-party.com/component.js"],
        "allowed-content-type": ["text/javascript"], 
        "integrity": "asdf1234"
      }
    ]
  }
</script>
rniwa commented

What if someone was not even aware of CSP? Or someone had limitation on server configurations they can change? Again, most websites don't deploy CSP correctly

This is still something I say is kinda weak. Ideally new features should be developed in tandem with their available security policies which have strong defaults.

The fact CSP is misused is because of complexities with the policies. Things like hashes and stuff are difficult to set up and get right. I feel that this could be resolved by having stricter defaults (e.g. not allowing JSON modules by default) and safer configuration that doesn't encourage unsecure things.

What is the mechanism by which JSON modules will be allowed then? If CSP, that would contradict your earlier statement that "I think there may be some confusion, I'm only proposing *.json application/json be applied as an assertion, it does not actually change the type".

If CSP directives indeed affect whether JSON module is enabled or not, then my earlier observation that CSP directives affecting how the served content is parsed (or not parsed), meaning that depending on CSP directives you may have on a website, the content may start executing where you were not expecting to execute withstands, and is not an acceptable solution to the problem at hand.

So if they're going to do this mapping automatically anyway why not allow them to configure it in the browser for their sites rather than adding pre-processing steps.

This is what my proposal does, it doesn't make browsers understand extensions. Instead it gives a way to make assertions about expected properties of resources (just content-type above but I think more is necessary for a concrete proposal) and means that people can be confident that content received is what is expected.

If your proposed mechanism to make browsers understand extensions is CSP, then that's not an acceptable solution as I have repeatedly stated.

In fact, any solution that involves out-of-bound definitions of types is unacceptable. The type of content that gets loaded by import statement must be defined in the same resource where import statement appears. This is precisely the core security issue we're trying to resolve here. We don't want anything but the very resource which has import statement to define how the fetched resource is processed, not in its header, not in some kind of site-wide configuration, or how in some other HTML / JS document which loads it.

That's a hard & absolute requirement for any solution to this problem.

@rniwa how do you square that with the reality that only the module author - not the module consumer - is the authority on how to parse a module? are you suggesting that every import (including wasm modules) should have this tax, that an explicit inline type be declared?

rniwa commented

@rniwa how do you square that with the reality that only the module author - not the module consumer - is the authority on how to parse a module? are you suggesting that every import (including wasm modules) should have this tax, that an explicit inline type be declared?

That's the only model that works. In fact, that's how Web works today. Look at how stylesheets, scripts, images, etc... are loaded with link & style element. They specify how the content should be processed; e.g. as a stylesheet.

We don't have a generic mechanism by which we fetch a resource and process it as a stylesheet and apply its style or process it as a script and execute it based on its content type. That would be fundamentally less secure.

That's not, however, how ES Modules were designed nor how they're used in practice.

To me, use of import is explicitly saying "this thing is safe to execute in the JS environment", and if i didn't want that, i wouldn't be using import.

i understand the want of importing json, it feels pretty smooth, but overall i have to agree with ljharb there. If performance is also a concern, i'd be curious why <link rel="preload" href="whatever" as="json"> combined with fetch(whatever) isn't sufficient.

rniwa commented

i understand the want of importing json, it feels pretty smooth, but overall i have to agree with ljharb there. If performance is also a concern, i'd be curious why combined with fetch(whatever) isn't sufficient.

Abandoning HTML, CSS, JSON modules and using fetch instead is an acceptable solution here, to us at least, although folks who want this technology will probably push back on that.

Your security stance as indicated here imo kills any non-js modules from ever existing, which contradicts the browser-approved design of modules themselves :-/

What is the mechanism by which JSON modules will be allowed then? If CSP, that would contradict your earlier statement that "I think there may be some confusion, I'm only proposing *.json application/json be applied as an assertion, it does not actually change the type".

It is allowed if it passes the policy, the type is still entirely determined by the Content-Type header. Note that is is true that it changes if it is parsed but it does not change how it is parsed.

That's the only model that works. In fact, that's how Web works today. Look at how stylesheets, scripts, images, etc... are loaded with link & style element. They specify how the content should be processed; e.g. as a stylesheet.

This is only partially true, Content-Type decides things as well. For example if I send a video Content-Type declares the decoder to use. Or if I use <script type="module" src="some-file"> and send text/python then the browser will reject it as currently specified.

The entry point determines what to do with the resource, the Content-Type distinguishes how to process that resource for that particular goal. It's just a case that things like <link rel="stylesheet"> only support one type (in this case text/css).

import only tells us to load it as a "module" it doesn't specify the type of module to load. The security implications are important to consider but this is a matter of ensuring resources are what you expect them to be (and people expect .json to be text/json) so instead of enforcing at every site why not enforce universal expectations (universal within a site at least) universally?

That's a hard & absolute requirement for any solution to this problem.

But you're defining the problem to be that developers need to have the parse type at import-site. That's circular, the problem isn't that, it's that we don't want upgrade from JSON->Javascript unexpectedly, the solution of placing the declaration at every call site is just one way of doing so.

rniwa commented

What is the mechanism by which JSON modules will be allowed then? If CSP, that would contradict your earlier statement that "I think there may be some confusion, I'm only proposing *.json application/json be applied as an assertion, it does not actually change the type".

It is allowed if it passes the policy, the type is still entirely determined by the Content-Type header.

Again, this is precisely what's insecure about the currently proposed model of HTML, CSS, & JSON modules. We can't rely on Content-Type to decide the processing mode.

Note that is is true that it changes if it is parsed but it does not change how it is parsed.

Ugh... disputing about what words mean what is the least productive we do in our industry. I can get academic and start using random jargons in ECMA or HTML standards but that's not gonna help anyone so let's not do that.

The entry point determines what to do with the resource, the Content-Type distinguishes how to process that resource for that particular goal. It's just a case that things like <link rel="stylesheet"> only support one type (in this case text/css).

This in turn is only partially true, and that's why we had to make things like CORB.

import only tells us to load it as a "module" it doesn't specify the type of module to load. The security implications are important to consider but this is a matter of ensuring resources are what you expect them to be (and people expect .json to be text/json) so instead of enforcing at every site why not enforce universal expectations (universal within a site at least) universally?

Because that's worse security surface. Now, each script deployed on a script must rely on some universal rules that may or may not change in the future to enforce such a policy. The author of each importer script has no way of enforcing what kind of module will be loaded.

That's a hard & absolute requirement for any solution to this problem.

But you're defining the problem to be that developers need to have the parse type at import-site. That's circular, the problem isn't that, it's that we don't want upgrade from JSON->Javascript unexpectedly, the solution of placing the declaration at every call site is just one way of doing so.

?? I'm saying that the importer must define what kind of module it's importing. There is nothing circular about it.

To me, use of import is explicitly saying "this thing is safe to execute in the JS environment", and if i didn't want that, i wouldn't be using import.

Sounds like an argument in favor of my previous suggestion to pick a keyword other than import for non-JS modules, to reinforce that they aren't safe to execute.

i'd be curious why <link rel="preload" href="whatever" as="json"> combined with fetch(whatever) isn't sufficient

Because we don't have async modules. I don't see any other way to support a pattern like

import { supportsBar } from "./config.json";

function foo() { ... }
function bar() { ... }

const xp = { foo, supportsBar };
if (supportsBar) { xp.bar = bar; }

module.exports = xp;

With JSON modules, the above won't execute until the dependent module (config.json) is loaded, and module loading will fail if the config file fails to load. (I'm aware this is an oversimplified example but variants on this pattern definitely do make sense.)

If you mean TLA, it's coming very soon: tc39/proposal-top-level-await#113

Oh wow, I didn't realize we were that close. So, my above would be replaced with const { supportsBar } = await fetch("./config.json").then(x=>x.json())? That really is the only use case I can think of. Maybe we don't need HTML/CSS/JSON modules after all?

Oh wow, I didn't realize we were that close. So, my above would be replaced with const { supportsBar } = await fetch("./config.json").then(x=>x.json())? That really is the only use case I can think of. Maybe we don't need HTML/CSS/JSON modules after all?

Top-level await is absolutely not a replacement for CSS and HTML modules.

Wide-spread use of TLA will slow down module loading unacceptably. If every component in a large module graph of components uses TLA to load it's styles, then we get a waterfall of awaited fetches from deeper modules in the graph on up. If a module loads multiple resources, it has to take care to not have a waterfall locally. The ergonomics are terrible compared to import. TLA is dangerous enough that there is movement to ban them in ServiceWorker context, and I would expect linters and build tools to offer mode to ban them in regular modules as well.

Also, TLA+fetch doesn't offer the additional and extremely useful module semantics like deduplication and import maps. Imagine a CSS file that's loaded into every component in an app:

const baseStyles = new CSSStyleSheet();
await baseStyles.replace(await (await fetch('../base/styles.css')).text());

First, that's absolutely horrific DX compared to import baseStyles from '../base/styles.css' as css;. Miss that first await and things break in a subtle way (the stylesheet updates in the background after the app logic has run, causing a flash of styling and a re-layout). Second, it doesn't benefit from deduplication at all. Every module that does this will cause a fetch of the base styles.

So now we have to add a cache, which further decreases the DX, so we add a wrapper library to load CSS and hope that every developer in the world uses the same CSS loader so they use the same cache. This screams for being a built-in due to all the pitfalls here.

Finally, this would leave the specific module semantics undefined and unimplemented, which leaves significant complexity to userland in the case of HTML modules, and precludes next steps in standardizing in-demand patterns around CSS modules (exporting class names, CSS references, potentially mixins, etc.).

Top-level await is absolutely not a replacement for CSS and HTML modules.

Don't worry, I wasn't suggesting that it was. Just for JSON resources.

TLA is dangerous enough that there is movement to ban them in ServiceWorker context, and I would expect linters and build tools to offer mode to ban them in regular modules as well.

That's because ServiceWorker stops registering listeners after the first tick, not because TLA is inherently evil. In fact, you can still dynamically import a TLA graph in ServiceWorker.

Is there a solution discussed anywhere already that can take inspiration from import maps to provide loaders declared out of band?

@tilgovi I think OOB has significant DX and usability downsides. See my comment here: tc39/proposal-import-attributes#13 (comment)

Is it OK for WebAssembly modules to have parsing behavior based on their MIME type? That's what the current proposal does.

That’s not ideal but less problematic because the expectation of loading WASM & JS are similar: they execute arbitrary scripts. It’s not so with CSS & JSON.

In think about this a bit today... WASM doesn't have access to the DOM, correct? So an author could assume a WASM module has restricted access if it's not explicitly passed functions to allow it DOM access. If a file previously served as application/wasm was later served with application/javascript, would this present a similar security concern?

The not-yet-implemented-in-any-browser Wasm/ESM integration proposal gives Wasm the same level of privilege as JavaScript by design. This proposal allows importing arbitrary JS modules (including cross-origin), which could export functions that manipulate the DOM but have signatures which are just based on numerics, so it would be importable and usable from Wasm. The goal is to allow transparent interaction.