Altinn/app-lib-dotnet

Loading different versions of app-frontend-react in staging vs production

olemartinorg opened this issue · 1 comments

Background

I have a line of issues I've created to try to find solutions to the problem where new releases of app-frontend-react with subtle bugs in them can cause show-stopping problems in the production environment (see releated issues below). As discussed in the postmortem meeting today, the fundamental problem might be that app-frontend-react releases straight to the production environment (via the cdn), and apps on both the staging (tt02) and production environments load the latest release at the same time.

The frontend is loaded from the Index.cshtml file in each app. This defaults to load the latest major version, which all currently released apps are referencing.

Proposed solution

Instead of loading the frontend directly, we should automatically switch between different versions (i.e. staging and productions). Several ways to achieve this has been proposed:

  • Implementing a lightweight 'loader' script that looks at the current hostname, and maps it to the correct environment to load the frontend version from.
    • We could alternatively pass the current environment (staging or production) from the backend (via an environment variable injected into an HTTP header, a generated javascript snippet, etc) such that the lightweight loader script can use that instead of the hostname to make a decision on which script to load.
  • Replace the Index.cshtml altogether with a simpler configuration file (and generate Index.cshtml using it). This seems to be the philosophy behind the nuget packages >v7, where we prefer configuration files over 'use our default implementation or extend if you want to'. This configuration file could list optional extra scripts/styles to go into the generated index file, along with which major version of the frontend to use (or pinning it to a specific one). The backend nuget libraries can then decide which environment (staging or production to load the frontend from).
  • Such a loader could also gradually roll out a new release version so it does not hit all users in the production environment at once (much like mender.io does it, by allowing you to roll out to a subset of users at a time). This could be as simple as implementing a sliding window algorithm by having each client randomly decide if it should load the latest (bleeding edge) release, or load the known stable one. As long as the weights change over time (from the point at which a new release is readily available), and we measure errors if they occur (with mechanisms for aborting a new release automatically or manually), we can guarantee a new release to be in production inside of a given timeframe after it is available. This could lead to minimal amount of downtime when critical problems occur, even in (and especially in) a large-scale environment.

Related issues

We kind of need Index.cshtml to reference a specific frontend version anyway (with some way of automatic updating). The current "silent" updates has caching issues where css and js might come out of sync. When we implement code splitting or add more assets, it will become even more problematic when the main js file gets downloaded and all the referenced assets are gone. A bad release (as we have had, and will have again), currently persists in chrome for a long time (unless the user push ctrl-shift-r), and support will get confused because they have not gotten the bad release yet.