WICG/import-maps

Preloading and performance with client-based resolution

guybedford opened this issue · 21 comments

This package map implies a particular frontend workflow whereby dependencies are traced by some process to generate the manfiest, which can then used by the browser. A server can then serve resources without needing to know exact resolutions of package boundaries, with the client providing those resolutions locally.

Performance considerations should be seen as a primary concern - so I'd like to discuss how this information asymmetry between the client and server will affect preloading workflows.

We've tackled these problems with SystemJS for some time, but instead of jumping to our solutions I want to aim to clearly explain the problems.

Edit: I've included an outline of how the SystemJS approach could be adapted to module maps.

Firstly, using the Link modulepreload header on servers with plain names like "lodash" will not be suitable when it is up to the client to determine where to resolve lodash.

Thus, any workflow using this approach will need to use client-based preloading techniques.

The cases for these are:

  1. Top-level <script type="module"> in an HTML page, varying between different HTML pages of the app.
  2. Dynamic import() statements in code.
  3. new Worker('x.js', { type: 'module' }) instantiations

All of these cases should support the ability to preload the full module tree to avoid the latency waterfall of module discovery.

The best techniques likely available in these cases will be:

  1. <link rel=modulepreload> for top-level module scripts
  2. Some kind of dynamic <link rel=modulepreload> injection for dynamic imports, using a custom JS function
  3. This same sort of dynamic <link rel=modulepreload> injection for workers.

There are problems with all these techniques:

  1. <link rel=modulepreload> information is now inlined into the HTML page itself (not a separate file). This means any resolution changes result in HTML changes, even if the manifest itself is a separate file. Also the preload information may be the same between different pages of an application but must be repeated in the HTML resulting in duplication.
  2. A dynamic JS-based module preloading method is fine, but usage might look something like: dynamicPreload('/exact/path/to/lodash.js'); dynamicPreload('/exacct/path/to/lodash-dep.js'); import('thing-that-uses-lodash');. In this scenario we have now tied the exact resolution of the dependency graph to a source in the dependency graph, which is pretty much the same sort of work needed to inline our resolutions into the module specifiers to begin with. We lose a lot of the caching benefits the package name maps provided in the first place of having resource caching independent of resolution caching. We are also duplicating a lot of tree information between dynamic imports in the app, and creating code bloat.
  3. In cases where dynamic imports import dynamic specifiers - import(variableName) - it is not possible at all to inline the preloading information into the source, possibly making these scenarios worse for performance.

So my question is - how can these techniques be improved to ensure that the workflows around package name maps can be performant without losing the benefits?

Please also let me know if any of the above arguments are unclear and I will gladly clarify.

Firstly, using the Link modulepreload header on servers with plain names like "lodash" will not be suitable when it is up to the client to determine where to resolve lodash.

Why is this? A modulepreload header should probably admit package names somehow, allowing the client to then begin fetching the resource as resolved via the package name map.

Even if not, there's also no reason why a server can't utilize the very same map to resolve names to paths and still use a path in the header.

Thanks for putting some thought to this @justinfagnani.

So the issue is that while yes, the modulepreload can indicate to load Lodash, it can't indicate to load any of the dependencies of lodash in turn - thereby creating a latency waterfall across all package boundaries. Package depth thus becomes a performance anti-pattern.

If the server uses the same map, that means that the same server cannot serve multiple versions of the same application with the same files with far-futures expires on those individual versions.

It's my understanding that modulepreload already works this way with paths - it loads a module and its dependencies as it discovers them, all driven by the client. If that's correct, how does adding a name resolution step create any more of a waterfall?

I'll have to think about the multi-versioning issue more, but I wonder why the package name map wouldn't be versioned along with the app. In fact, it probably helps with versioning because the map can specify the versioned file paths completely independently of the module source. That way a non-leaf module doesn't need to change if it participates in two app versions with two different versions of dependencies - increasing cacheability.

@justinfagnani to solve the waterfall problem one needs to inline a modulepreload for each module of the tree depth to avoid the latency problem. Otherwise one is only saving one round trip latency, when really we want to save n round trip latencies, where n is the depth of the tree.

In the package name spec, the n here becomes the package name lookup depth of the graph (eg pkgA depending on pkgB depending on pkgC where those resolutions are defined by the module map).

The aim here being perfect optimal performance - unnecessary latency should always be avoided in the name of keeping the web fast.

Definitely versioned package maps are the way - but the comment was that inlining link tags for deep dependencies means that the link header for a given source would effectively inline a part of the module map - which could then be out of sync with the actual module map - the server could inline one version of lodash's dependencies, while the client expects to use a different version of lodash.

If there isn't any further feedback on this topic, I'd like to share a draft proposal based on what we've done in SystemJS to speed up the loading waterfall.

Alright, so here's what we do in SystemJS and how that might apply here. The feature in SystemJS is called depCache and basically stores the list of unresolved dependencies for each unique module string. Whenever a module is loaded, we then use this information to trigger preloading of the provided dependencies (effectively a dynamic injection of <link rel="modulepreload">, and this was why I have argued strongly for modulepreload it to be non-recursive to avoid problematic functional complexity here).

The reasoning here is that the build process that constructs the module map, will have all the same information to construct the preloading graph, and thereby avoid a latency waterfall solving the problem of optimized flat loading.

I really think this would be a huge win for module maps to help this performance problem.

Here is a rough idea of how such a thing might work:

Say I have a main script importing pkg1 which the module map resolves, where pkg1 then imports from pkg2, which again the module map resolves so that preloading information is dependent on the module map.

We could then write something like:

{
  "path_prefix": "/node_modules",
  "packages": {
    "pkg1": { "main": "pkg1.js" }
  },
  "scopes": {
    "pkg1": {
      "packages": {
        "pkg2": { "path": "pkg2", "main": "pkg2.js" }
      }
    }
  }
  "preloadDependencies": {
    "pkg1/pkg1.js": ["pkg2", "./pkg1-dep.js"],
    "pkg2/pkg2.js": ["./pkg2-dep.js"]
  }
}

where the keys of preloadDependencies are URLs relative to path_prefix and when matched provide the unresolved specifiers of dependencies to be preloaded. We run those resolutions through the main resolve algorithm again to work out what exactly to preload, then recursively apply preloading again for those modules once resolved.

This way, as soon as import "pkg" happens or even import("pkg"), we can ensure that all dependency modules are requested in parallel ensuring minimum latency.

It's a relatively simple feature overall to add I feel for the massive performance benefit to be gained.

@guybedford Your proposal makes a lot of sense to me. It is the client that ought to manage preloading. The client is already given the receipe/rules to perform resolution in the form of the name-map generated by a tool. So we might as well supplement the name-map with derived data to allow entrypoints to be parallel fetched & parsed rather than involving waterfall discovery.

(Tangent: I'd love to find a way to have eager evaluation of side-effect-free children but alas the spec needs early errors for bad parsing of the full graph before we can start any evaling. Anyway, that's another story.)

This thread is massive and I haven't had time to read through all of the text in it. But skimming the latest it seems to run afoul of similar concerns to https://github.com/domenic/package-name-maps#supplying-out-of-band-metadata-for-each-module ; that is, this is not meant to be a solution for all modules, and is focused only on package name mappings. A separate proposal, not related to this one, is more appropriate for anything (like preloading) that applies to all modules.

I'm not suggesting this extend to any other out-of-band metadata here. This could even be restricted to only include package dependencies (pkg2 and ignoring ./pkg1-dep.js in the example). The package name map is the only source of truth for the deep tree, so it is the only place we can solve the deep preloading through package resolutions. Otherwise latency waterfalls will dominate module workflows.

I don't think it's fair to say that the package name map is the only source of truth for the deep tree, when I'm specifically pointing out that the intended source of truth for the deep tree is a separate manifest file unrelated to the package name map.

Perhaps it would help to flesh out how you see these performance problems getting solved then? I originally aimed for an open discussion around these concerns - specifically not to try to jump to solutions here but to discuss the problem space and how we can help users avoid the waterfall, as at the moment it will be the default behaviour. I'd suggest reading the thread though.

Via a separate manifest file, unrelated to packages, that lists all the modules in the graph.

I see, do you mean to suggest new spec work for this? Or something in userland?

Yes, new spec work. Possibly related to https://github.com/WICG/webpackage

Interesting. Would you expect such a preload manifest to inline full URLs, or might it be able to deal with preloading unresolved module specifiers? Sharing resolution source of truth may cause sync issues between the two. Also I hope whatever route we take modules can be provided performantly from the get-go, also in order not to tarnish reputation when numbers start landing.

Full URLs seem easiest, but if there's a nice way to ensure (perhaps with tooling?) that the two stay in sync, that would make sense.

I'm not sure what your concern is about performance from the get-go. Modules are an incremental project; browsers have been shipping pieces of them for a long time, and will continue to ship pieces as they become ready. There has been so far no desire to hold back on shipping module-related features until they are polished to perfection.

Full URLs seem easiest

Easiest if specifying a generalized preload manifest yes.

but if there's a nice way to ensure (perhaps with tooling?) that the two stay in sync, that would make sense.

So picturing the a user workflow for the best performance, we're running some kind of npm-generate-package-map app.js and.js other-entry-points.js which is then generating "preload.manifest" and "package.map" for us. I guess that absolute URLs with a leading / would be ok in such a preload manifest though? Then we're adding those two manifests to the appropriate HTML pages.

That can work definitely, it would just be good to see real momentum on such a spec and a commitment to the production performance use case from this spec, to ensure these problems are solved.

I'm not sure what your concern is about performance from the get-go. Modules are an incremental project; browsers have been shipping pieces of them for a long time, and will continue to ship pieces as they become ready. There has been so far no desire to hold back on shipping module-related features until they are polished to perfection.

This argument doesn't really hold for me. Also never thought I'd hear someone from the Chrome team saying performance doesn't matter :)

Modules have been a development feature, and as such, yes performance has not been a concern.

For production modules features (which it seems like this one is trying to be), performance is a primary concern, and not an afterthought. Let's not fall back into old thinking just because performance hasn't been a concern for modules specs up to now. Production modules features are a very different thing.

If I've misunderstood and the goal is just to have a nice development experience then sure, it's not a concern.

The goal is to have a nice developer experience, and to separately continue our efforts to improve performance, so that modules can be used by larger and larger sites in production. Those efforts are orthogonal, so that in the end, you can have a nice developer experience in production.

As for commitment to such a spec, I'd welcome your efforts there; it's not the focus of this repository, or of my work at the moment.

I must admit this sounds a lot like passing the buck on the performance of this browser feature, which was really not what I was hoping to hear.

I work on these things out of my own free time voluntarily between consulting work, but if it really is going to come down to it then in the name of the health of the web if there is something I can do I can certainly look into it.

I guess something like <link rel="preloadmanifest" href="preload.json"> with a signature of { [url: string]: { rel: string, url: string }[] } would work. Any suggestions on where best to take this?

I do believe the two belong together... Why would these two pieces of information be separate? (and besides apparent logic, I'm also thinking on the "nice developer experience" goal...)

So, if the dependency graph belongs in the said webpackage spec, should the package name map belong there as well?

As the proposal has changed a lot, I am triaging old issues. I think even with new proposal, the conclusions here still apply. Namely, that this thread is requesting a separate feature, which is not our focus as part of the import maps proposal.

As noted in https://github.com/domenic/import-maps#supplying-out-of-band-metadata-for-each-module, the new proposal is tentatively a bit more connected to the idea of a "module manifest" than the previous one. But, I still lean toward that being a separate proposal, that should not be entangled into this one, for the reasons listed there.