Consider relying on eTags (or other headers) for service worker dependencies to check for updates

Question

Consider relying on eTags (or other headers) for service worker dependencies to check for updates

delapuente opened this issue 10 years ago · 57 comments

For the sake of modularization and isolation. Could it be possible to improve the update algorithm to rely on the eTag / content-size / other headers sent by the server to decide when a service worker changed? Now we need to include a mark of change in the sw and this forces a lot of developments to postprocess service worker files.

Answer 1 · 2016-02-23T11:50:36.000Z

I think @ehsan and I had a proposal for this at some point.

Answer 2 · 2016-02-23T12:09:19.000Z

If the request is made with If-Modified-Since or If-None-Match, and the response is 200, we could assume this is a new SW even if it's byte identical. Although this would cause unnecessary service worker updates on servers that send out Etag or Last-Modified but don't correctly return 304.

Else we just add Service-Worker-Force-Update: 1 or something.

Answer 3 · 2016-02-23T13:58:57.000Z

More than checking if the main sw file changed, I'm referring to its dependencies imported via importScripts().

Answer 4 · 2016-02-23T14:46:59.000Z

Yep, understand that, you'd be sending an etag (or last-modified) that represented the SW and its dependencies.

Maybe that's too hacky?

Answer 5 · 2016-02-23T19:58:49.000Z

I think that undermines the purpose of the ETag about digesting the content being served. It would do the trick but I think is better to come up with something more pluggable in the actual ecosystem not requiring to hack with semantics. ;)

Answer 6 · 2016-02-25T09:11:56.000Z

Yeah, I think you're right. Service-Worker-Force-Update: 1 would work.

Answer 7 · 2016-04-10T18:10:33.000Z

I think we should have a JS API for this, reg.update({force: true}) perhaps.

Answer 8 · 2016-04-12T22:40:30.000Z

F2F: agreement on serviceworker.skipWaiting() and reg.update({force: true}) - but may need to look at naming of force.

Lots of problems using etags for this, eg if CDN stop serving etags for some reasons.

Rough idea around some-header-name: value where value is a digest like etags.

Answer 9 · 2016-04-12T22:41:04.000Z

reg.update({force: true}) will leave you with the existing worker if the update fails, as usual.

Answer 10 · 2016-04-14T10:09:38.000Z

Was anyone able to dig up the previous proposal from @ehsan and I? I can't find it.

Answer 11 · 2016-07-09T11:27:55.000Z

+1 for this

There should be a way to update the imported scripts even when the main code of the service worker is unchanged.

This is also a big issue for BaaS whose scripts are imported in the customers' service workers: I have more extensively described our situation in this blog post.

Answer 12 · 2016-07-09T13:23:33.000Z

@jakearchibald Can you remind me why we don't just check importScripts() for byte changes as well? I seem to recall it was since importScripts could be called async way back when, but now we've made that throw. Since we only allow sync importScripts maybe we can just include it in the byte check.

Answer 13 · 2016-07-11T00:36:44.000Z

See issue #639 for the importScripts discussion, which roughly concluded with "let's revisit this later".

Answer 14 · 2016-07-19T14:47:58.000Z

@wanderview

Can you remind me why we don't just check importScripts() for byte changes as well?

It could mean a lot of network requests just for an update check. I was worried about that. But also it means a change to a third party script would result in a whole new SW install, which sounds a bit… invasive.

I like reg.update({force: true}) as it give us a script-land way to do Chrome's "update on reload", but maybe for third party scripts we need to revisit the idea of providing access to the cache SW uses to store its scripts.

importScripts('//example.com/whatever.js');

// then later…
self.registration.getCache().then(cache => {
  cache.add('//example.com/whatever.js');
});

My big worry here is security. We'd only want the SW to be able to access this, as giving access from pages would turn a minor XSS into a huge problem.

Answer 15 · 2016-07-19T15:00:17.000Z

It could mean a lot of network requests just for an update check.

Sure, but this is a trade off sites can make for themselves. Do I want the convenience of structuring my code in separate files? Or do I want to minimize the number of SW script 304s my server has to send?

I could see smaller shops opting for developer convenience while huge sites compacting into a single file to minimize server load.

But also it means a change to a third party script would result in a whole new SW install, which sounds a bit… invasive.

What I'm hearing is that this is what developers expect to happen and they are surprised when it doesn't.

I like reg.update({force: true}) as it give us a script-land way to do Chrome's "update on reload"

I like this too, but I think its a different use case. From what I can tell developers want to compose their service worker scripts from decoupled sources and have things just update to the latest. Every step we add to get the updates to trigger creates friction and requires more tight coupling between modules.

At the very least it seems we could do this as an opt-in to register(). Something like update-checks-imports:true or something.

Answer 16 · 2016-07-19T15:09:54.000Z

update-checks-imports:true is interesting. Or something called during the install event to set which scripts should be checked.

If I show a "please refresh for latest version" message when there's an update waiting, I'm not sure I'd want that just because Google Analytics or whoever had updated something.

Then again, third party services like Analytics will live in Foreign Fetch instead.

Answer 17 · 2016-07-19T18:47:58.000Z

this is what developers expect to happen and they are surprised when it doesn't.

Exactly! IMHO The web is so great (and Javascript is becoming pervasive) for its simplicity. Please don't create a giant and complex monster: leave that to native apps. And please don't fall into premature optimization.

update-checks-imports:true is interesting

I agree. But I don't think that that choice should be left to the user. What if he denies? Then he would never get updates and the scripts will finally break.

I think that if you don't want all scripts to be refreshed by default you should leave to the developer the choice. For example: importScripts('//example.com/whatever.js', check-for-updates: true);. So the developer can prevent the refresh for large files (like Analytics) and allow small and more useful scripts to be refreshed automatically.

Answer 18 · 2016-07-26T10:50:57.000Z

update-checks-imports:true is interesting. Or something called during the install event to set which scripts should be checked.

I was thinking about this:

importScripts('//3rd-party.com/whatever.js', { forceUpdate: true }); // makes update algorithm to byte-to-byte compare the dependency.

This way, the developer can mark the dependencies causing updates, this trade off @wanderview was talking about is made explicit at the same time you can preserve file sanity via modularization.

We could make forceUpdate to be true by default (so we should change current spec but it will end with a more predictable API) or false (preserving current spec).

Furtheremore, if, at some time, the developer want a dependency to be part of the checking no longer, she simply flips the flag.

Answer 19 · 2016-07-27T18:10:30.000Z

Having the option in importScripts makes real sense. Nice. Unfortunately it doesn't cover JavaScript modules so well.

Answer 20 · 2016-07-27T18:15:06.000Z

But do we want to make a service worker specific importScripts() interface? This would not work if someone uses a library that then uses the existing importScripts() API internally. The top level import would get updated but not any of its dependencies.

I think it would be better to put this on the install or activate event personally. It can then automatically apply for all importScripts, modules, or other future added methods of bringing in script.

Answer 21 · 2016-07-27T18:52:05.000Z

Sure. The thing I liked about the importScripts solution is it was at a resource level, but I'm sure we can achieve that via another API.

pondering

If the API was something like alsoCheckTheByteEqualityOfThese(requests), could you include things that you weren't using in modules and importScripts? That would enable you to have a single resource that echoed the version number. Dunno if that's useful.

Answer 22 · 2016-07-27T19:41:23.000Z

Sure. The thing I liked about the importScripts solution is it was at a resource level, but I'm sure we can achieve that via another API.

That was the idea.

But do we want to make a service worker specific importScripts() interface? This would not work if someone uses a library that then uses the existing importScripts() API internally.

Well, what I would find extremely uncomfortable is to re-declare my imports for marking purposes only. Perhaps introducing a new import function (importScriptForcingUpdate(...))?

Dealing with ES6 modules is complicated but what about a pragma:

import "my-library";

"force update";
import "my-other-library";

I don't really like it and I don't really know if there is a standard mechanism to introduce "use strict"-like pragmas in ES6 but declarative APIs are this kind of inconvenient.

Answer 23 · 2016-07-29T18:22:42.000Z

F2F:

Should we check all imported scripts by default? Yes
Check the flattened imported scripts & the main script, if any of them are byte-by-byte different, including !ok responses, trigger an update (where a !ok response will fail the update)
The browser may optimise for this, eg if the main script has changed it doesn't need to check its imported scripts
No opt-out of this

Answer 24 · 2016-07-29T18:36:01.000Z

FWIW, as a service worker tooling author, I'm really happy about this.

sw-precache currently forces developers to use its output as a top-level service worker script, because it relies on the byte-by-byte check of the versioning information it includes inline to trigger the Install flow.

It sounds like after this change is implemented, developers can start pulling in the sw-precache output via importScripts() while maintaining the same ability to trigger the Install flow.

Answer 25 · 2016-08-01T17:46:39.000Z

Gecko bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1290951

Answer 26 · 2016-08-05T08:08:44.000Z

Same feedback that @jeffposnick, we have created some tools gulp-sww right now we have to do nasty tricks (like creating an unused variable with the timestamp to ensure change in the registered script).

This will be pretty useful :)

Answer 27 · 2016-09-19T16:59:57.000Z

I created a bug for this functionality at https://bugs.chromium.org/p/chromium/issues/detail?id=648295 in case anyone can pick that up on the Chromium side of things.

Answer 28 · 2016-12-06T08:27:37.000Z

Heads up that I'm starting to look into implementing this in Chromium due to high demand. It'd be nice to have spec text.

Answer 29 · 2016-12-07T02:29:05.000Z

We had an implementation question about this today actually.

Currently the spec allows importScripts() to run asynchronously up until the install event waitUntil() resolves. See:

Step 2 of https://w3c.github.io/ServiceWorker/#importscripts
And step 15 of https://w3c.github.io/ServiceWorker/#install

We don't currently implement this in Firefox. We still require importScripts() to run synchronously at script evaluation time. @mattto, does chrome implement async importScripts() like this spec text yet?

If not, I think it would be helpful to set the "imported scripts updated flag" immediately after script evaluation time instead of waiting until after the end of install processing.

Thoughts?

Edit: I mean, it might be helpful in order to implement the update check. We can simply evaluate the script again. Or maybe its orthogonal to this issue.

Answer 30 · 2016-12-07T03:17:09.000Z

Chrome does sync importScripts. I'm not sure I understand how the spec allows async importScripts() since https://w3c.github.io/ServiceWorker/#importscripts doesn't mention "in parallel" so I'd assume the steps run in sequence. If I understand correctly, the "updated flag" just indicates "read from network" vs "read from installed service worker storage".

Answer 31 · 2016-12-07T03:19:30.000Z

Well, I think its more about when that flag is set. AFAICT, its set when the install processing completes which is an async job. Or is it set elsewhere and I am confused?

Answer 32 · 2016-12-07T03:24:54.000Z

Ah, yes. I don't understand why it's set at that point.

I think Chrome expects the importScript to run on initial script evaluation and will error or not work correctly otherwise. Would have to test that.

Answer 33 · 2016-12-07T11:45:08.000Z

@mattto @wanderview I've opened #1021 for the importScripts flag discussion.

Answer 34 · 2016-12-07T23:06:31.000Z

Ok. Happy to see the "importScripts() after script eval" issue tackled separately from this issue. We will likely implement the byte-for-byte check for importScripts() first and then consider the delayed importScripts() issue later.

Answer 35 · 2016-12-08T05:39:57.000Z

So, this thread is now for #839 (comment).

Answer 36 · 2016-12-08T05:48:12.000Z

Check the flattened imported scripts & the main script, if any of them are byte-by-byte different, including !ok responses, trigger an update (where a !ok response will fail the update)

The timing of a byte-by-byte different check depends on the decision of #1021. But I'll assume, during the work for this thread, the check will be done before invoking Install.

Answer 37 · 2016-12-12T07:48:48.000Z

I think the byte-by-byte check should happen before running the SW. I think author's wouldn't expect a new SW to be spawned on each soft update attempt, and it'd be quite wasteful.

I think the timing doesn't necessarily depend on whether delayed importScripts() is supported. When a SW is installed, you can store the URL of each script (main and imported) and their bytes. Then when an update starts, you fetch those URLs and compare. If they are the same you abort the update.

Answer 38 · 2016-12-12T08:42:43.000Z

Then when an update starts, you fetch those URLs and compare. If they are the same you abort the update.

Isn't the resolution of this issue to include imported scripts to the byte-by-byte check for updates? How can we detect whether imported scripts have been byte-changed or not without calling importScripts()?

Also, based on the resolution here, I think the byte-by-byte check should be done before invoking Install: #1021 (comment).

Answer 39 · 2016-12-12T08:49:50.000Z

How can we detect whether imported scripts have been byte-changed or not without calling importScripts()?

Maybe I'm missing something, but I thought you'd just fetch the script and then compare it to the one on disk. That is, when a SW is installed you have on disk all its scripts (both main and imported) and those URLs. To do a byte-to-byte comparison, you'd fetch all the scripts and compare the bytes on disk.

Answer 40 · 2016-12-12T08:52:16.000Z

I think I got your idea. That'd be certainly better!

I worried about the case where the imported scripts sequences are changed.. but in that case the original script should have been altered.

Answer 41 · 2016-12-12T10:18:06.000Z

Are the byte-for-byte checks shared across different service workers? (If two service workers import the same script, are the etags/hashes "shared"?)

If so, in some circumstances the browser will immediately know (i.e. without needing to do a network byte-for-byte check) that a dependency in the "middle" of a dependency graph has been updated. What happens next?

A
+- B
+  +- C
+- D

If the browser knows that B has been updated (because some other service worker also had a dependency on B) which parts of the tree are re-checked? Also, for how long is a freshness check considered to be valid?

Answer 42 · 2016-12-12T11:03:32.000Z

@ithinkihaveacat

Are the byte-for-byte checks shared across different service workers? (If two service workers import the same script, are the etags/hashes "shared"?)

When a service worker fetches a script (either the main script or imported), it will go to the network (optionally) via the HTTP cache. It won't go to the cache API or the script cache of any other service workers.

Answer 43 · 2016-12-12T11:14:38.000Z

@jakearchibald Can that lead to a situation where the resources A and B are both updated at the origin, but the browser only notices that B has changed (because of timing-related artefacts of the HTTP cache), and so updates the SW using the "old" version of A and the "new" version of B?

Answer 44 · 2016-12-12T11:45:13.000Z

@ithinkihaveacat yep. Same is true for HTML documents. We have made the HTTP cache opt-in because of developer confusion around this (#893). Developers who opt into the HTTP cache should understand how it works.

Answer 45 · 2016-12-12T17:11:12.000Z

Yea, I agree they are orthogonal issues.

Answer 46 · 2016-12-13T12:51:43.000Z

An interesting point has been raised internally - is it possible we could damage sites relying on caching by making this change. I'll reach out to our biggest users and see how they feel. Worst comes to worst, we could make no-cache opt-in.

Answer 47 · 2016-12-13T14:41:03.000Z

@jakearchibald When you talk to these sites, can you also mention there is a work around if they are serving unique hashed resources? They can set cache-control:immutable with a very large max-age to avoid these network requests at all in firefox/chrome.

Answer 48 · 2016-12-13T14:59:54.000Z

@wanderview Sites that are able to generate unique hashed resources wouldn't really need this feature though, right?

I thought the point was to make it possible for service workers to do e.g. importScripts('https://www.gstatic.com/firebasejs/firebase-app.js'); and be able to quickly and reliably pick up changes to https://www.gstatic.com/firebasejs/firebase-app.js even if the service worker itself remained byte-for-byte identical.

If the default for all network activity related to service worker update checks becomes no-cache (as per #839 (comment)) then that's going to result in a lot of 304s for any widely deployed resource. (However, not doing this would lead to browsers potentially getting themselves into an inconsistent state #839 (comment).)

Answer 49 · 2016-12-13T15:19:47.000Z

I thought the point was to make it possible for service workers to do e.g. importScripts('https://www.gstatic.com/firebasejs/firebase-app.js'); and be able to quickly and reliably pick up changes to https://www.gstatic.com/firebasejs/firebase-app.js even if the service worker itself remained byte-for-byte identical.

I guess I thought people typically versioned 3rd party dependencies. Allowing external dependencies to float at-will in production seems kind of crazy to me.

Answer 50 · 2016-12-13T15:32:12.000Z

I guess I thought people typically versioned 3rd party dependencies. Allowing external dependencies to float at-will in production seems kind of crazy to me.

I suppose it depends on the use case. Something like https://www.google-analytics.com/ga.js isn't versioned, and that works out fine.

https://www.hodinkee.com/OneSignalSDKWorker.js consists of one line:

importScripts('https://cdn.onesignal.com/sdks/OneSignalSDK.js');

Obviously whatever's in OneSignalSDK.js could be inlined into OneSignalSDKWorker.js (would even save a network request), but then OneSignal need to get Hodinkee to deploy a new version every time they update their SDK.

Answer 51 · 2017-02-14T09:22:37.000Z

Is this already implemented in Chrome or Firefox?

Answer 52 · 2017-02-14T14:14:08.000Z

Is this already implemented in Chrome or Firefox?

Updating based on importScripts() in FF has been started, but not completed:

https://bugzilla.mozilla.org/show_bug.cgi?id=1290951

Related to this, defaulting updates to no-cache is already implemented in FF53:

https://bugzilla.mozilla.org/show_bug.cgi?id=1290944

Answer 53 · 2017-04-04T07:02:12.000Z

@KenjiBaheux & I should email SW users to make sure big users of SW are aware of this.

Answer 54 · 2017-04-04T08:09:09.000Z

The F2F resolution was to check importScripts in the byte-for-byte comparison; however issue #893 changed so that useCache would specify caching the importScripts by default.

Answer 55 · 2018-03-07T05:59:11.000Z

While working on this issue with @mattto, I found out we need to discuss about when to fetch and compare the imported classic scripts for Update. (See #1283 (comment).) Now we have two options:

Fetch imported scripts during the first evaluation of the main script in Update.
Fetch imported scripts (of newestWorker) before evaluating the main script.

(2) allows us to return early even before starting a worker. @jakearchibald seemed to be concerned about double-download (https://github.com/w3c/ServiceWorker/pull/1023/files#r92201798) here, but we can avoid importScripts() in the main script downloading the scripts from the network because we fill in the cache before that time.

But if the imported scripts in (2) have errors, we can't avoid running the main script and the cached scripts before catching those errors anyway.

Thoughts?

/cc @jakearchibald @wanderview @aliams @cdumez

EDIT: I tried it with (1) in #1283.

Answer 56 · 2018-03-07T07:04:05.000Z

I responded in #1283 (comment), but to reiterate here, I'm much more concerned about needlessly starting a service worker in the common case than in the error case. I think we should avoid starting a service worker until the byte-to-byte update check (including importScripts) shows that an update is possible. Otherwise almost every navigation will start a new service worker to do an update check which will usually just be wasted.

Answer 57 · 2018-03-07T07:16:28.000Z

That's a fair point. I agree to (2). We can do this without doing a double-download.