LinkedInAttic/inject

Handling multiple baseURLs and multiple versions of a module

jakobo opened this issue · 11 comments

These ideas are based on http://jspm.io/

One of the challenges LinkedIn currently faces is modules that come from multiple locations. For example, if your app depends on the "global" jquery, you want to specify your origin at the front of the string. This would allow for more than one base path. While RequireJS allows for multiple context objects (as does Inject technically), there really should be a cleaner solution.

The second challenge we're looking at is how to support versioning. Upgrading a shared library shouldn't force all dependencies to auto-update, especially if there is a breaking change.

JSPM solves this using the following syntax

origin:module/path#x.y.z

However, this isn't really compatible with the AMD standard.

There is a plugin developed by @guybedford at https://github.com/guybedford/amd-version but right in the readme it says this is a terrible idea.

1: For having code from different locations, simply use RequireJS paths configuration. The JSPM location syntax is identical to the RequireJS paths configuration underneath really.

Instead of:

origin:module

Use:

origin/module

Where:

  requirejs.config({
    paths: {
      origin: 'https://some-cdn.com'
    }
  });

I changed the syntax to the : in JSPM because it immediately makes it much more obvious to the end user what is going on, but underneath it is implemented in exactly the same way.

2: For using multiple versions, this is also entirely compatible with RequireJS. The syntax only makes it look like something more complicated going on. It is all done at the file level:

  - jquery.js
  - jquery@1.9.js
  - jquery@1.9.1.js
  - jquery@2.0.js
  - jquery@2.0.1.js

Having a folder structure like the above, immediately allows the requires:

  requirejs(['jquery']); // load the latest version of jquery
  requirejs(['jquery@1.9']); // load the latest 1.9 version of jquery
  requirejs(['jquery@1.9.1']); // load the exact 1.9.1 jquery version
  requirejs(['jquery@2.0']); // load the latest 2.0 version of jquery
  requirejs(['jquery@2.0.1']); // load the exact 2.0.1 version of jquery
  requirejs(['jquery']);

The way this works is that the shortcut forms of jquery@1.9 and jquery are simply wrapper modules that point to the correct modules. Effectively an alias:

jquery.js:

  define(['./jquery@2.0.1'], function(m) { return m; });

jquery@1.9.js:

  define(['./jquery@1.9.1'], function(m) { return m; });

This same mapping can apply to modules that are in folders:

  - bootstrap@3.0.0
    - bootstrap.js
  - bootstrap@3.0.1
    - bootstrap.js
  - bootstrap.js
  - bootstrap@3.0.js
  - bootstrap@3.0.1.js
  - bootstrap@3.0.0.js

The shortcut modules here are the js files in the base folder. For example:

bootstrap@3.0.0.js:

  define(['./bootstrap@3.0.0/bootstrap'], function(m) { return m; });

bootstrap.js:

  define(['./bootstrap@3.0.1'], function(m) { return m; });

This way the main entry points get mapped naturally, without needing global configuration, and versions all work as expected.

Over time as updates are made, these wrapper modules can be updated to point to the newer versions as appropriate.

I hope that helps somewhat and makes some sense. It seems the most natural method for version management to me with module systems. I do need to get a better write up of this together to get more people interested in it, would be interested to hear your thoughts.

On remapping paths:
That seems very straightforward when explained that way. I believe we can achieve something similar in Inject by using the fileRule syntax to support either for jspm or AMD style modules. (using common config is on our roadmap) For inject, we'd be using code similar to below.

Inject.addFileRule(/^origin(?:[\/])/, function(path) {
  return 'https://some-cdn.com/' + path.replace(/^origin(?:[\/])/, '');
});

Once we have common config support, mapping the paths in to addFileRule directives should be pretty straightforward.

On versioning:
The idea of using modules makes a lot of sense. At LinkedIn, we've been grappling with two main issues: "nearest" and "cdn cache". While having the intermediate modules definitely helps with the nearest / best module problem, I'm not sure quite how to solve the CDN cache one.

Let's use the bootstrap example:

 - bootstrap@3.0.0
    - bootstrap.js
  - bootstrap@3.0.1
    - bootstrap.js
  - bootstrap.js
  - bootstrap@3.0.js
  - bootstrap@3.0.1.js
  - bootstrap@3.0.0.js

And the bootstrap@3.0.js file

define(['./bootstrap@3.0.1'], function(m) { return m; });

When bootstrap 3.0.2 is released, we will update the bootstrap@3.0.js file. However, without some form of CDN cache busting, 3.0.1 will continue to get returned to clients. I'd be curious your thoughts for the best practice for cache invalidation in that scenario.

In the runtime version, using a SemVer "nearest" means that the loader could decide at runtime which version was the best version to supply. A request for bootstrap@3.0 would get remapped to bootstrap@3.0.2 before any http requests were made.

Thanks Guy for all the feedback. It's been a huge help as we think through how to do some of this stuff over at LinkedIn.

The path mapping sounds about right to me there.

As for the version cache, the JSPM server uses the following cache rules in the browser:

  • Main version module (just a pointer): 1 hour
  • Minor version module (just a pointer): 1 day
  • Exact version module (the module itself): 1 week

The exact version module itself can be occasionally changed (for example if a security change is backported or something odd like that). In an earlier version I made this years, but it turned out to be hard to manage!

Then the files also use etags as well to minimise duplicate payloads. Perhaps in your scenario, the pointer modules could also provide a hash that could be compared in case the 1 week expiry on the exact version is up, to save redownloading.

Always happy to discuss. I'm tempted to write up these ideas as some kind of "package version convention", still looking for the right way to communicate it. Using these version conventions is the primary reason I don't use Bower, and instead wrote a custom CLI. Browser package management needs a flat version system like this with version-suffixed names.

There has been a recent change in the ES6 loader specification, which allows the normalize function to be asynchronous. This opens up some new possibilities for version resolution, which I'm currently considering.

Since it is in line with the same possibilities discussed here already, I thought it may benefit to share the considerations here, and I would also really appreciate hearing from your experiences. If I am crowding the issue, or if this isn't useful, please don't hesitate to remove this - it's very much on the off chance that it might be useful to discuss this here.

Basically, the initial module @jakobo found (https://github.com/guybedford/amd-version) that used semvers in the version URL is now possible again for ES6 module loaders.

That means one could write:

  loader.import('jquery@>=1.6.2')
  // or
  loader.import('jquery@~1.8')

This wasn't possible until now since the normalize function was previously synchronous. This allows an initial request to get the package configuration and metadata (main, shim, dependency information), which can then inform a normalization to an exact version and hence URL to resolve to.

The issue with a system like this is it is a greedy algorithm - if a matching version is already in the registry, it uses it, if not it loads the most recent module matching the version constraints.

So in the example above, the load orders would result in jquery@2.0.3 loaded first, and then another instance, jquery@1.8.3 would be loaded second. This is the non-optimality showing.

More optimal versions can be analyzed globally through testing / checking the whole tree and then locked down at a global configuration level:

  loader.config({
    versions: {
      'jquery': 'jquery@1.8.2'
    }
  });

Then the same requires as above would only need to use a single jQuery version.

Such a version lock down would be used in production, allowing the development process to be free of version worries, while the production system doesn't get slowed down by the need for package meta requests.

Alternatives to this approach include the "minor version" system described above, or to maintain semver ranges through package.json files at a static level only, resolving exact versions through a separate build step like is done with Browserify for example.

It would be great to hear what key use cases are being catered to your side, in order to understand what might be the most worthwhile approach here. Thanks for your time.

Thanks @guybedford for the update, I hadn't gone to read the new changes in the ES6 loader spec yet. At LinkedIn, we are grappling with multiple versions of jQuery; specifically the application may want one version of jQuery, and the outer page (we call in the "chrome") may need a different version. SemVer ranges allow us to hopefully map multiple jQuery requests down to a least common denominator, or at least an already-resolved compatible version.

Since the "chrome" and the application are in two separate build steps, it's very difficult to make them aware of each other at the build time. As a crutch, we opted to explore a runtime calculation of dependencies. Given the above example with jQuery 2.0.3 and 1.8.2 the non optimal case would load jQuery twice. However, such a system would be aware it was loading jQuery a second time, and could notify external systems that the even occurred. For LinkedIn, the performance hit due to mismanaged dependencies outweighs breakages due to backwards incompatible APIs.

@jakobo thanks so much for your feedback. It does sound like to ensure the maximum portability of a module, inlining semver ranges into the requires themselves may well make a lot of sense. In this way, multiple module bundles can find their own common dependency solutions, through application testing, which can then be locked down in production through global configuration.

It looks like it would be worthwhile to pursue these directions for jspm, so thanks very much for sharing your considerations.

Another alternative we've been playing with on the Inject side is using annotations as a sort of "inline package.json" file. In the AMD world, the file would look something like

/**
 * @amd
 * @depends jquery@~1.8.x
 */
define(['jquery'], function($) {
  // ...
});

For regular AMD systems that don't know any better, things would work the same as they always had. For Inject since it's an XHR+eval system (versus a script tag injection system) we could actually parse out the @depends annotations and make an on-the-fly change if we already had a compatible version loaded or in cache.

I find it interesting that you use 1.8.x in your example. This is supported fine by the minor version system described previously. Do you really see scenarios where more advanced semver ranges would definitely be a help in a dependency managed system?

I've considered inline information in that way many times, but have always kept away from it, as it doesn't seem right. Thinking about it here are some reasons against such a system:

  • Dependencies don't apply to files, they apply to entire packages. One shouldn't need to maintain a copy of the version (jquery@~1.8 above) in each file of a package. By having a single central config where this is contained, it is easier to change, and easier to set.
  • Minification could easily remove this. I try to use strings for config for these reasons, eg "@depends jquery@~1.8.x";

I think the main thing is that package configuration should be a global for all files of that package - containing information like the shim, module format, main entry point, and dependency versions.

Up until now I've statically built this information into all the files at deployment. (eg rewriting define(['jquery']) into define(['jquery@~1.8.x']) above). But I am very much liking this idea of a separate request for package configuration that is used in development only. Still trying to weigh it all up though.

I just wanted to update on the route I've taken here with work on jspm.

The core of it is to focus purely on what it means for two versions to be semver-compatible. In NPM version ranges, this is described by the caret operator (^) (see the definition at https://github.com/isaacs/node-semver#ranges).

The insight is that the caret operator can be fully implemented from the client, assuming only the ability to query jquery@1.9.3 and jquery@1.9 and jquery@1 style version responses, and to compare two versions.

See the further description and implementation in https://github.com/guybedford/systemjs/blob/master/lib/system-versions.js

This avoids complex semver ranges and normalization rules, making for a lightweight system that can handle the needs of dynamic dependency-managed applications, provided they properly implement semver.

I'm happy to discuss further if you are interested, I just didn't want to go into a complex explanation here unnecessarily.

Hi Guy, thanks for the update! We've been multitasking over here at LinkedIn, so we haven't come back to the semver stuff for a while.

I think you hit the heart of semver. "I want a version of X that works with my code, but I don't actually care which version I get".

(also, by using the regex, the whole semver piece becomes a lot simpler)

Thanks for the update as to where jsmp.io is going.

Going to close this, as folks should be using SystemJS if they have a hard requirement on runtime loading, or Webpack/Rollup/Parcel/Broccoli/etc if they're comfortable with the bundling, tree shaking, and dynamic loading those packagers provide.