w3c/ServiceWorker

postMessage keeps service workers alive indefinitely

mfalken opened this issue ยท 32 comments

Chrome has a bug where a service worker can postMessage to itself to stay alive indefinitely: https://bugs.chromium.org/p/chromium/issues/detail?id=647943

You can also register at two scopes and have those workers ping-pong messages.

I'm wondering if this should be prevented at the spec level, for example:

  • postMessage to a service worker that has no in-scope (or same-origin) client won't wake it up, or
  • postMessage from a service worker doesn't wake up the destination service worker

If this is an implementation detail, would something heuristic like a SW can be kept alive by at most 5 messages events be reasonable?

Workers can also ping-pong each other with fetch() and foreign fetch, so I guess a more general solution is needed.

I think they should be considered abnormal operations, and the user agent should detect and terminate rather than disallowing the usage itself.

"Keeping the worker alive" is a browser implementation detail, no? The whole grace timer mechanism just falls into the spec's "browser can kill the worker whenever it wants" language.

In FF we currently avoid this by conveniently not setting self.registration.active, etc. So its impossible to self postMessage() right now. When we implement that, though, I was planning to just inspect the source of the message event. If it was the same worker that was being posted to, then we would not increment the keep-alive reference count.

We decided that background sync can only be called when there's a document open on the origin. This prevents one-off background sync being turned into a periodic sync by recalling sync inside a sync event.

We'll want something similar for background fetch, but it'd be nice to be able to start a download in response to a push message.

I wonder if we can fix all these things with the same mechanism.

When we create an extendable event, we can assign an expiry time to it.

  • A fetch event gets an expiry time of x minutes
  • A postMessage sent by a client with an associated document (so a document or a worker linked to a document) gets an expiry time of x minutes
  • A postMessage sent by a service worker A to service worker B gets an expiry time equal to the longest expiry time currently holding service worker A open

This means two service workers post messaging each other can only stay alive for a total of 5 minutes.

If we're tagging extendable events like this, we can also say things like "background sync can only be used if the SW is being held open by a push/fetch event".

Too crazy?

I think it's OK to call this an implementation detail. I was originally thinking the spec gives room to terminate long-running events, whereas this is a bit different: the browser sees "postMessage" and decides to not postMessage. But you could consider this being the browser terminating the worker just before/during the postMessage call, so it fits under the spec language of termination "at any time".

Jake's is an interesting idea. I may use something like that for the implementation.

Hah, I just posted the exact same thing @jakearchibald described in the chromium bug for the same issue with regard to foreign fetch (crbug.com/662443). So yes, I think the solution jake describes makes a lot of sense, when generalized a bit more to apply to any event triggered by another service worker (such as in this case foreign fetch).

Also if in the future we start allowing service workers to start shared and/or dedicated workers, we should make sure that messages/fetches (regular and foreign) made by those workers get the same timeout treatment if a service worker is the only thing holding the worker alive.

And while I agree that the exact timeout behavior is probably fine as an implementation detail, we should probably at least make sure to mention this issue in the spec somewhere and encourage implementations to put some mitigations in place to prevent it.

FWIW, Chrome has implemented the timeout behavior described here for message events and foreign fetch events triggered by another service worker.

F2F: We should spec this "time passing" behaviour.

Why is it undesirable for a ServiceWorker to remain alive? Is there a tangible net impact outside of it's suggested implementation ?

Suggested implementation is pretty useless. What can we do in 5 seconds before it dies? Surely if web apps need to run 'like native apps' with all this service worker technology we need the service worker to stay active/alive not keep stopping due to its limited lifespan. We are having to move all our web app code to a android native wrapper and use a webview because the service worker stops running code. Users open up other apps and when the app loses focus, poor old service worker stops. At this rate we may as well just do away with the web app service worker thing altogether. very frustrating. We need code to run in background when the app loses focus otherwise its pointless. there, ive said it.

@cchighman

Why is it undesirable for a ServiceWorker to remain alive? Is there a tangible net impact outside of it's suggested implementation ?

Privacy & memory usage.

But users request to use our apps and they grant permission so why is privacy an issue? when its granted by the user?. if they wish to revoke permissions let them and all permissions and service workers etc etc can be cleaned up and memory freed. Why is this an issue with web apps? when we are consuming vast amounts of memory with streaming and downloading content more than ever. Who actually complains? the users want to use our apps not be inconvenienced by apps that stop working due to W3C policing how we write apps. its not helping developers its constraining them. just talking here.

@MarkEmerson

What can we do in 5 seconds

I think you're the first to mention "5 seconds".

we need the service worker to stay active/alive not keep stopping due to its limited lifespan

We have that with workers & shared workers.

We are having to move all our web app code to a android native wrapper and use a webview because the service worker stops running code

What kind of work are you trying to do that can't be done in a regular (or shared) worker?

We need code to run in background when the app loses focus otherwise its pointless

By the way, Android is starting to prevent apps from doing this. It's a big source of battery issues.

For particular long-running processes, you may be interested in https://w3c.github.io/wake-lock/.

@MarkEmerson

users request to use our apps and they grant permission so why is privacy an issue?

Service worker isn't behind a permission. Wake lock would be, but presumably it would involve some sort of persistent UI to inform the user that a particular app has a wake lock. This is similar to what recent Android versions do โ€“ they use a sticky notification.

we are consuming vast amounts of memory with streaming and downloading content more than ever.

But that's very visible to the user. We're planning similar visibility with https://github.com/WICG/background-fetch.

Who actually complains?

Users. They value their privacy and the battery life of their device, especially if these things become broken while the device's screen is off.

great responses - sorry for rant but having all sorts of issues with service worker at present. please tell me if ive missed something. Push API and firebase messages are working dandy but no geo available in service worker is not good for us as users need to use navigation apps and put our app in background. no geo when app loses focus either. we need a heart beat sent to server from client even if the app loses focus. but service worker stops due to its lifespan limitations. using firebase pings to wake service worker seems impractical - or is it?. are we missing anything fundamental here? can we achieve all of the above with shared workers? without the service worker stopping and when the app loses focus?

just reading some of the W3org discussions about service workers and background location etc. when i see comments on W3org like this: > 'Tracking the user in the background is highly likely to be a case of a site acting poorly'. According to who? who makes these assumptions on behalf of both user and developer?

using firebase pings to wake service worker seems impractical

Yeah, you have to show a notification as the result of a push. Again, we want this background work to be user-visible.

need to use navigation apps and put our app in background

It feels like https://w3c.github.io/wake-lock/ is what you're looking for. You want a CPU lock on the tab, maybe a screen lock too (if you're showing position on maps).

can we achieve all of the above with shared workers? without the service worker stopping and when the app loses focus?

Unfortunately not. Shared workers will be killed once their parent page is killed.

'Tracking the user in the background is highly likely to be a case of a site acting poorly'. According to who?

You don't have to search too hard to find angry articles like this https://www.theverge.com/2016/11/30/13763714/uber-location-data-tracking-app-privacy-ios-android. People generally don't like being spied on.

You may also be interested in https://techcrunch.com/2014/04/04/the-right-way-to-ask-users-for-ios-permissions/, which goes into some detail on permissions. It's difficult getting the balance right. You want powerful features, but you want to make sure the user is aware (continually if necessary) they're being used.

Agreed, the balance between usability, privacy and trust is the challenge. So the takeaway is: a web app will not support background tasks at present or near future. Id rather not move our web app code to our android wrapper but looks like we have no choice now. Thanks again Jake for all your invaluable feedback, references and advice. Mark

I think https://w3c.github.io/wake-lock/ gives us a way forward that balances the needs of background processing & keeping the user informed. But yeah, if you need to ship this year, a native wrapper will do it. Like I said, Android is cracking down on background work, so be careful you don't end up with the same problems there.

@wanderview, Chrome implemented the behavior in #980 (comment): https://codereview.chromium.org/2506263005. I'd like to check with you before spec'ing this if Firefox will implement the same.

Is #980 (comment) really what was implemented? I've been told that chrome will not kill a service worker if there is a controlled Client open. These timers are only applied on "background" service workers without any Client. That doesn't seem to be completely expressed in that comment.

Chrome certainly kills service workers (that sounds so wrong) with a controlled Client open. It's done that since the early implementation.

We implemented the basic idea of #980 (comment). Each running service worker has a "max timeout" computed as the longest timeout of all its requests, and when a new message or foreign fetch request arrives, the timeout of the new request is clamped to the max timeout of the originating service worker, so it can't extend the overall lifetime beyond the originating worker's.

What about if there is a fetch event? @jakearchibald suggested to me these should have infinite lifetime as long as the client exists in order to support playing long videos, etc.

Our implementation still kills the worker in this case. The behavior probably should change in the future. There's an open bug about it: https://bugs.chromium.org/p/chromium/issues/detail?id=678798

I guess I would be happy to see the spec say something like:

  1. The service worker has a lifetime budget. When the budget is exhausted the service worker is killed even if there are pending extension promises.
  2. The lifetime budget is updated per #980 (comment)
  3. The lifetime budget may also be dependent on the existence of a controlled client. For example, if a FetchEvent grants a large/infinite budget, but then the user closes the requesting window, then the SW should have that infinite budget reduced.

We can leave the exact budget values as an implementation detail for now since it seems some tuning is still happening.

@jakearchibald suggested to me these should have infinite lifetime as long as the client exists in order to support playing long videos, etc.

The lifetime budget may also be dependent on the existence of a controlled client.

I think https://github.com/WICG/background-fetch#playing-a-podcast-as-its-background-fetching would solve the issue posed in https://bugs.chromium.org/p/chromium/issues/detail?id=678798. That considered, I think we better keep the hard lifetime budget to be in favor of battery usage and to not allow maliciously written service workers.

Apart from the battery issue, it might become a an overall performance issue, if there's no lifetime budget.

We do not want services like Coinhive getting tied to service workers, without users being notified about it.

There are good usecases too with having long running service worker. Service worker are better then SharedWorker if it were not for the short lifetime. In a since that it can

  • handle FetchEvent.
  • Download stuff with streams, respondWith & content-disposition attachment.
  • Stay alive durning reload
  • Do backgroundFetch
  • Push
  • Sync
  • Caching
  • And as the title describes, sending postMessages to all Clients (tabs+workers), And much more.

(+ Safari never went ahead and implemented SharedWorkers, and i guess they never will - b/c of service worker maybe? And b/c of this i have always gone for a service worker approach instead of using SharedWorkers)

I would like to see SharedWorker gone and be replaced with ServiceWorker instead. Kind of a waste if you need to do cross tab stuff with a single global state and also listen to fetch events to serve stuff from Cache storage at the same time... You could just as well solve it with one worker instead of having two.

I think the hard short lifetime is stupid and it will only make developer do this kind of postMessage hack, clearly there are good use cases for having long lived service worker also. Developers have asked for long lived service worker many times now (even when the site is not open). This is one of those Daemon things that would make you develop a Native app instead of a PWA, which is sad.

I think the "stay alive while any origin tab is open" is better approach then a timer, when the user closes all associated tabs only then would it be more acceptable to start a 5 minutes timer. This would make it behave more like a SharedWorker and they don't have this restricted 5 minutes to do it's job so why should service worker have that while you are on the website and SharedWorker do not? So what makes a ServiceWorker so frightful that it needs a "hard" countdown on every single ServiceWorker event it receives before killing the worker? I could as well just start up a shared worker and forward all event to it instead which seems a bit wasteful especially if you need to do something in the service worker scope

Now it may sound to me like i think the hole waitUntil is useless and we should remove waitUntil but that's is not really the case, I still think it's useful to put the worker down when it is not needed any more and woken up later again when needed.
I still think that the design patern that waitUntil should be called and used everywhere, especially so if no tab is open. But if the site have reasons for keeping a long lived service worker alive and the user is using the site, then let them keep it alive. otherwise you can start a timer

I say let the service worker take on the roll as a SharedWorker if the developer wants it.
Don't stop us from sending ping messages and filling in a gap in Safari where SharedWorker are gone.


Now for some real indefinitely long lived background Service Worker Daemon (without a open tab use cases)

WebTorrent, IPFS. PeerCloud, DHT, PeerCDN, StreamRoot, and Peer5 and others would all benefit from having WebRTC in service worker and run for longer period of time in the background without having a tab open to the same origin.
They are all best suited in either SharedWorker or ServiceWorker but the lack of WebRTC in those worker context you don't want to spin them up in every main thread.
WebTorrent could serv video files from Service worker with respondWith instead of using MSE. This lets browser handle seeking and more supports more other video containers here is a PoC of that

StreamSaver.js don't use WebRTC but it also needs to live longer to save large data to the hard drive while de/encrypting stuff on the fly.

I almost think the hole backgroundFetch was unnecessary first and could have been avoided by other means if you only let us keep the service worker alive for longer.

The Wake lock don't solve all scenarios, there are use cases where not everything have to be displayed on the screen.

But if we are worried about things like Coinhive and other crypto miner and sites spying on us all the time, can't browser then instead help to defeat them with a prompt saying something like

"example.com" is running a high CPU intensive Service worker" or
"example.com" have been running a worker for more then 20 minutes"

Kill it block it (meaning: from ever running again without a tab open) or keep it alive

is the phone in your pocket and you don't see this message then by all means kill it put a little timer that presses the kill it button after x seconds Kill it (5..4..3..0)

Developer would do good by trying to avoid those bad UX message from ever popping up and use the waitUntil rightfully so

Or don't show this messages, just kill it when the user have closed all tabs (I think a service worker should have the same lifetime expectancies as a SharedWorker)
let us ask for permission using the Permission API. when there are good legitimitet reason for doing so with a context describing why a sites wants service worker to stay alive. We already require user interaction for permission request which is good so you can't prompt directly upon visit.

The idea with some lifetime budget is not that bad either. But I guess I would prefer a Permission request instead.

I almost think the hole backgroundFetch was unnecessary first and could have been avoided by other means if you only let us keep the service worker alive for longer.

This would be a battery drain and a privacy issue. Messages don't help while the phone is in the user's pocket.

Or don't show this messages, just kill it when the user have closed all tabs

That wouldn't be very 'background'.

We're not going to allow a service worker to run indefinitely. Features that require some form of persistence should be handed off to the OS somehow, with a way for the origin to collect the data later (like how background fetch works).

Closing this in favour of #1566 which covers how to spec expiry.

@jakearchibald I'm new to service workers, so I may be making some false assumptions, but it seems to me that being able to have the service worker live while tabs are open would be a huge step in the direction of simplifying web development with MPAs. I mean, sometimes you need to have some stuff that keeps running for as long as the page is open, like web sockets, and, currently, AFAIK that can only be accomplished with SPAs. So why not let service workers fill that gap? Web development is so much easier with MPAs.