Financial-Times/dotcom-page-kit

Code splitting for FT.com

Closed this issue · 8 comments

This issue is a catch-all for our discussions around developing a code splitting strategy for delivering JavaScript to our users of FT.com.

The reason we wish to enable code-splitting is to increase the cache-ability of our client-side code. JavaScript kilobytes are the most expensive so leveraging caching more is a key component of Anvil's performance strategy and meeting our success metric of decreasing time to interactive.

By splitting our JavaScript code effectively this should improve caching by:

  • Decreasing the number of times we bust the cache - small changes to one piece of code should not force a user to re-download everything.
  • Enable increased code reuse between our user-facing microservices - if the front page, stream pages, and article pages all need o-date@3.0.1, then we should aim to reuse that package across all of them. A user journey across these apps will be faster with assets ready in a warm cache.

Webpack offers very powerful code splitting tools which should have the flexibility for us to build an appropriate strategy on top of. We have a basic example of this code splitting feature already which demonstrates how to split all of the npm packages required by a bundle into separate files but this is not an appropriate strategy for FT.com because:

  1. Our apps can have hundreds of separate dependencies. Handling many small files is slower than sending one big bundle (and may limit effective usage of resource hint headers?)
  2. Some of our apps have multiple entry points (usually with the intention of lazy loading them) but we currently lack suitable tooling to reassemble these at runtime if split into pieces.

There are also some other things to consider:

  • One a limitation of Webpack's runtime (a small module used when leveraging code splitting which can track each dependency loaded in the browser) is that it cannot dynamically load scripts. Therefore to initialise an app we must first send all of the scripts required to the browser. This is not necessarily a bad thing as it enables the use of resource hints and downloading scripts in parallel but it could also mean sending too much, decreasing the effectiveness of lazy loading.

  • The limitations of the Webpack runtime can be avoided by using techniques such as dynamic import() but this technique can actually increase the time it takes for scripts to be executed as the pieces may need to be parsed and executed in series in order to build a complete dependency tree. It is also unclear how this may be retrofitted to existing code.

  • Our apps build their own assets and are not aware of the asset graph of other apps. In order for applications to reuse any scripts they must generate files with the same name and contents.

So, please submit your ideas for a JavaScript code splitting strategy which:

  • Enables reuse of shared code across multiple user-facing microservices
  • Balances the number of files against the time required downloading and processing them
  • Does not prevent multiple entry points and/or lazy loading code
  • Enables effective use of resource hints and parallel downloads

One strategy discussed may be to create named bundles with hard-coded lists of dependencies to be included in each. This is based on the assumption that all user facing applications will at least require most of the code included. e.g.:

shared-libraries.js:

  • ftdomdelegate
  • superstore
  • superstore-sync
  • o-toggle
  • o-date
  • o-viewport
  • o-grid
  • o-lazy-load
  • n-ui-foundations

global-ui.js:

  • header
  • footer
  • n-topic-search
  • errors
  • tracking
  • ads

This is similar to the current approach taken by n-ui but this may not be granular enough to fully exploit the benefits of caching and maintaining a hard-coded list may not be sustainable.

A second strategy discussed by the team was to only split out dependencies which match a list of patterns, for example /^o-/ would match all Origami packages installed using Bower. This could strike a good balance between the number of files and benefits of caching, however it is unclear how these files would be re-assembled on the client-side - for example when an app has multiple entry points how will we know to load only the scripts required by one bundle?

Tracking which files are required by each entry point is a tricky problem. By default Webpack adopts a strategy of prepending each filename with the name of the bundle that requires it. For example if an app has two entry points; A and B, and both A and B depend on a package named C we could end up with the following script files:

  • A.bundle.js
  • B.bundle.js
  • A~B~C.bundle.js

This is straightforward to reassemble ("go get all bundles matching /A/) but does not work for our use case because another app may only have one entry point (D) and also rely on package C. Following this convention the second app would not be able to reuseA~B~C-bundle.js and would instead generate and use its own file (D~C.bundle.js).

Could we split dependencies by how frequently they change... i.e.

  • slowAndStable.bundle.js
  • changesAboutOnceAWeek.bundle.js
  • changesEveryDay.bundle.js

I suppose the rate of change is fluid for most repos. But there might be a sweet spot for increasing the number of files in a bundle in correlation with a lower rate of change, and thus the need to serve a new bundle.

It would be good to check out dynamic imports and the available polyfills.

It might be far fetched for us but I'm reminded of this article about code splitting on the google search site https://medium.com/@cramforce/designing-very-large-javascript-applications-6e013a3291a3

That article's comments reference this and this

@benbarnett I think this approach probably has the most effort to reward ratio that we've discussed so far as it's the simplest to implement and can make a big difference. I'm concerned that manual maintenance of this may become a burden if the bundles are split more than a few times though.

@kavanagh From the article you linked to:

Only load logic if it was rendered

Some of us did that years ago https://www.maketea.co.uk/2013/11/05/introducing-groundwork-js.html (AMD solved all this didn't it?! 😉)

We do aim to support dynamic import() but some more experimentation is required to see how best to use it. From what I understand this technique is suitable for lazy loading but has similar issues to #169 (comment) (e.g. if article-page.js does import lodash from 'lodash' but front-page.js does await import('lodash') then they won't be able to share the lodash dependency without extra effort.

A strategy was implemented in #195, integration added by #273, and has been documented in #281.

I will leave this issue open for now as I expect we may need to tweak the strategy a few times over the next few weeks.

I believe this is now complete so I am closing this issue.