cloudflare/workers-chat-demo

How do duration limits work with websockets?

WesleyYue opened this issue · 7 comments

One thing I find very confusing about Cloudflare Workers and WebSockets in the documentation is that it is not clear how sustaining a websocket makes sense when there are hard limits on the worker duration. After reading many pages of the docs, this is what I think the model is:

  • Workers, unlike traditional FaaS, are priced by CPU Times, which is different from Wall Clock Duration.
  • CPU Time is strictly when a line of code executes. Wall Clock Duration is CPU Time + time spent waiting on async calls when CPU is idle
  • Because of this, 30s of CPU time (the Workers limit), typically, actually maps to significantly longer Wall Clock Duration, making duration limits a non-issue to sustain a websocket connection since each message will take ~1ms (is this the right magnitude?), giving you basically ~30,000 messages per request on the Unbound plan, but because there's a higher chance of eviction after 30s of Wall Clock Duration, in practice you'll get lower than 30,000 messsages per request.

My questions:

  • The marketing page seems to contradict the docs on pricing. The landing page says 30s wall time per request, but everywhere I look in the documentation seems to suggest either 50ms of CPU time for the Bundled plan, and unlimited Wall Clock Duration for the Unbounded plan?
  • What exactly is the model for eviction? Can my Worker get evicted mid execution? Can my Durable Object Worker get evicted while idle but holding a bunch of open connections? (Is that why there needs to be logic to rejoin on the client side? Because the Durable Object can get killed at any point?)
    image

So, the first thing that's important to understand is that time for pricing purposes, and time limits after which your worker is canceled, are two different things. It's easy to get confused between these since they use similar numbers, but you should consider them to be essentially unrelated.

  • Under the Workers Unbound pricing model (the default for the paid plan), you are billed for wall time, not CPU time. Or, more precisely, you are billed for the wall time or 8x CPU time, whichever is greater -- but usually wall time is greater. However, there is a cap on this billing at 30s. If you use more than 30s for a single request, the additional time is not charged.
  • At the same time, there is a 30s limit per request on CPU time. After you've consumed 30s of CPU time, the request is forcefully canceled. (The marketing screen shot does seem to misstate this point...)
  • A third, much less talked-about concern is that when requests run longer than 30s (wall time), the chance that they will be randomly canceled increases. Technically, any request might be randomly canceled at any time, for example, if the machine running it has a sudden hardware failure. However, when a request runs beyond 30s, the number of reasons for random cancellation increases. In particular, when we are upgrading the Cloudflare Workers runtime itself, we allow in-flight requests up to 30s to finish what they were doing before we restart the server.

When it comes to WebSockets (or similarly, long streaming responses), if you are simply proxying through your worker (either to origin, or to a Durable Object), WITHOUT processing individual messages / chunks using a JavaScript event handler, then this proxying takes zero CPU time, because no JavaScript code is executed. In that case, a single request can theoretically run for an unlimited amount of time, although practically speaking it's likely to be randomly canceled at some point. Assuming it runs for more than 30s, you will be charged for 30s of duration but no more.

Note that the above discussion all applies to stateless workers, NOT durable objects. Durable Objects are different because they do not have discrete "requests". The entire lifetime of the object is essentially billed like one request. However, the 30s cap on billing does NOT apply to Durable Objects. If a Durable Object runs for a while day, it is billed for 86400s of duration. The DO stops running (and stops incurring charges) when there are no clients connected, but aside from that, the number of clients is irrelevant -- the charge is the same for 1 client vs. 1000.

Note that this means allocating one Durable Object for every WebSocket connection is currently rather expensive. We recommend an architecture where many WebSockets terminate at a single DO. However, we're working on improvements that will allow a DO to sleep even when a WebSocket is attached (as long as the WebSocket is idle).

What exactly is the model for eviction? Can my Worker get evicted mid execution? Can my Durable Object Worker get evicted while idle but holding a bunch of open connections? (Is that why there needs to be logic to rejoin on the client side? Because the Durable Object can get killed at any point?)

In general, you should design your code to be able to handle any Worker (including ones running Durable Objects) being randomly evicted at any time. This is generally true of any distributed system. Most of the time, this just means retrying / reconnecting whenever something goes wrong.

With that said, the intent is that Workers will usually only be evicted when there are no clients connected.

Thank you! This clears up a lot of things.

However, there is a cap on this billing at 30s. If you use more than 30s for a single request, the additional time is not charged.

Also, I double checked the documentation, and I actually don't think this is mentioned anywhere. I looked the limits and pricing page, ctrl-f for "30", and I could not find any reference that says billing caps at 30s of wall time.

I actually don't think this is mentioned anywhere

Yeah... There's a constant debate between "explain everything in complete detail" and "don't overwhelm the user" which makes it pretty hard to settle on exactly what should be documented and what shouldn't be. :/ It might help if you file an issue on the docs repo asking for clarification -- direct questions from users carry more weight than engineers and PMs debating in the abstract. :)

At the same time, there is a 30s limit per request on CPU time. After you've consumed 30s of CPU time, the request is forcefully canceled. (The marketing screen shot does seem to misstate this point...)

Yeah, that looks just straight up wrong. I'd go fix it myself but can't see where it's defined in either an internal repo or external repo, so I guess I'll just raise it on Monday.

Note that this means allocating one Durable Object for every WebSocket connection is currently rather expensive. We recommend an architecture where many WebSockets terminate at a single DO.

Are you still recommended this after the release of Hibernation API? I'm thinking about allocate each user one DO (still approx one WS per DO since they may not open multiple tabs normally). I'm still leaning toward sharing DO between users for cost saving purposes but if the saving isn't worth it then I should choose simpler architecture.

Are you still recommended this after the release of Hibernation API?

No. If you use hibernation properly then there's little benefit to trying to aggregate connections, because you are only billed for the time that your object is actively working on an event. Presumably you have the same number of events with either design, so there's not likely a huge difference between them being handled in a single object vs. many objects.

Meanwhile, Durable Objects work best when you make them as fine-grained as possible, as this allows each object to live near the user that owns it and avoids scalability issues from objects being single-threaded.