electric-sql/electric

Requests to the beginning of the shape log (offset=-1) should have a really long cache

Closed this issue · 13 comments

Perhaps 1 month for the max-age and 3 months more for SWR

cache-control: public, max-age=2629746, stale-while-revalidate=7889238

It's dramatically faster if the browser pulls shapes out of their own cache and almost as fast to read from the CDN and the slowest by far to read off the Electric cache.

We can do this as shape logs are immutable and append-only. The worse case scenario is that a shape is requested when it has no rows and then grows a lot — but even this isn't that bad as a client can make multiple requests in a row fairly quickly and would still get the bulk of the data in the 2nd request.

Also in the worst-case scenario, we could add a way for Electric to purge shape paths in caches if they're badly out-of-date.

Related to #1447

Also related to improving chunking as a stable initial chunk would also mean that caches don't need to expire.

Ok thinking about this more — we essentially want the user's private cache to keep the segment of the shape log ~forever. Browsers store the parsed JSON so loading that from cache is very fast. E.g. 1mb of cached JSON on disk loads on my laptop in around ~20-40ms. Far faster than a browser could ever load from the CDN.

And even if it ends up that for a long-lived shape that a browser has cached offsets 1-10k and then 10k-57k and then 57k-123k or whatever, all of those will load in ~100ms. So very fast to get the sequence. So it's better it does that vs. going back to reload 1-123k.

So we want max-age to be set to a very high number so browsers cache JSON segments.

But we don't want shared HTTP caches to cache for that long. Because when a client is loading a log from scratch, it is far better for them if they load complete logs vs. loading one and then going back and grabbing another one, etc. (though we do some nice optimizations here still e.g. immediately start fetching the next segment from the offset in the headers).

So we want to use s-maxage for this as that controls the TTL of the cache in shared caches like CDNs. We could set max-age for private caches in clients to 3 months, s-maxage for shared caches to 4 hours, and then stale-while-revalidate also to 3 months so the shared cache will still quickly respond with old but rarely fetched shape logs to serve quickly but will still re-validate them frequently against Electric so shapes that change frequently are mostly up-to-date on the shared cache.

We could have heuristics for cache length e.g. how many rows is there in the segment (more means longer cache), how many duplicate operations is there (more means shorter cache (more means e.g. a row that got updated 500 times and needs compacted), is it the segment for the initial query (this should be highly cachable), etc.

I think it's very safe to extend our caches out to weeks & then we could do a lot of testing on what happens in different scenarios to build our confidence for what types of log segments are cachable for a very long time.

If the offset -1 initial sync gets the response in max chunk size responses (ie 10MB chunks) then can those chunks also have a long shared max age? Because we don’t need the CDN to refetch to consolidate the initial sync because it’s already optimal at that point?

@thruflo right yeah that's a good heuristic — if there's a lot of data then it'll be more stable than a small amount of data. An initial sync might also get no rows because the data hasn't been created yet — in that case, it's not a useful cache and we do want both shared and private caches to refetch when there is a lot of data.

An intermediate step is that requests with offset -1 get a long max-age and we never return up-to-date on this request and all other offsets do return up-to-date (if no chunking) w/ a short max-age.

We can improve this in the future by seeing if a shape log segment is aged despite not starting at -1 but this seems like the easiest win in the short-term for the lowest effort.

I guess the trade off is that it will always take at least two requests to render a new shape (a -1 and another to get an up-to-date).

In the browser, if the shape+offset are persisted it renders refetching from -1 a bit moot. As in, why would you resync the shape of you have it already. On the other hand, just auto using the browser cache as a replacement for needing persistence is pretty wild. It’s like magic offline.

Right yeah it's at least two requests on the initial fetch of a shape and then one fetch when -1 is cached in the browser — but these are fairly cheap still though as @msfstef's prefetch PR means that we start fetching the next chunk of the shape as soon as the headers arrive — so given it takes some time to download the body, sometimes the up-to-date chunk will have arrived before the -1 offset chunk will have finished downloading.

just auto using the browser cache as a replacement for needing persistence is pretty wild. It’s like magic offline.

yeah! The browser cache is super fast. For truly offline or for tons of data, you'll want indexeddb/OPFS but most of the time, I suspect just heavily leveraging the browser cache is the fastest/cheapest option.

It just occurred to me that in some cases you will not want compaction to start from the beginning of the shape log to preserve the cached chunks and only compact later changes.

It just occurred to me that in some cases you will not want compaction to start from the beginning of the shape log to preserve the cached chunks and only compact later changes.

It's actually fine if you do compact the earlier stuff — because the log is monotonic, everything still works if the earlier log is compacted because e.g.a client that grabbed the initial chunk 3 months ago can still keep grabbing the new logs entries (compacted or not) that come later on.

We want client caches to stay intact as long as possible but we also want new clients to get the most compact log as possible. So set the max-age to a long period and s-maxage to a short period.

@KyleAMathews should we close this now that the relevant PR has been merged?

Yup, the follow-up work is covered by #1447

client that grabbed the initial chunk 3 months ago can still keep grabbing the new logs entries (compacted or not) that come later on.

I am thinking of patterns where a the tail of the shape, but really active on the tip, e.g. activity on a small subset of new keys. it seems it would make sense to be compacting closer to the tip. We always want to make the caches of -1 last, but if we're successful, it means a lot of people will be resuming from arbitrary points which will hit the server more frequently, right?

Note that time spent compacting suffixes of the log will still be well used compute time when you want to incorporate those chunks into the earlier chunks of the log, because you'll still be reducing the number of comparisons.

We have a lot of options for playing with how we compact and cache later segments with #1447 e.g. if a lot of people are fetching e.g. 2 day old offsets, those can be cached longer & compacted so people get efficient catch-ups.