Cache does not work if timeout is set to None
gwenya opened this issue · 9 comments
According to the Django documentation, a value of None
for the cache timeout should mean that entries in the cache never expire (see https://docs.djangoproject.com/en/5.0/ref/settings/#std-setting-CACHES-TIMEOUT).
Using this appears desirable for wagtail cache, if we use cache invalidation, since there is no reason for a page to be re-rendered every hour/day/month/year/whatever.
However, wagtail cache will just not store anything into the cache if the cache timeout is None:
wagtail-cache/wagtailcache/cache.py
Line 299 in 79638e9
There is also an exception in the admin on the cache settings page because the code for formatting the timeout in a human-readable way cannot deal with None-values:
It is of course possible to just set the timeout to an arbitrarily high value, but supporting None
seems nicer.
Thanks for pointing this out. Our implementation does deviate from the Django behavior.
Since wagtail-cache sets HTTP Expires
and Cache-Control max-age
headers, and those must be a date in the future, it is not possible to support indefinite expiry. For example, the server will tell the browser that this web page expires on a certain date and time in the future. Usually this is a few hours, or a few days. Of course, if you change anything on your website, the user will not see these changes until the Expires date/time. So it is usually not advisable to cache it too long. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires.
The Django cache is more of an internal cache, it is not doing a front-end HTTP cache based on that. So that's why they are able to support the None value.
I think we should update our documentation to indicate that this cannot be None, since that deviates from the Django behavior.
Hmh on the one hand that makes sense, on the other hand it appears that wagtail-cache is doing two completely unrelated things and mixing them up a bit then. Ideally there would then be two separate timeout values, for example one might want to cache responses on the server indefinitely so that they don't need to be rerendered unless the data changes, but have much shorter caching on the client (or none at all) since the browser cache cannot be easily invalidated.
The only sensible way for pages to be cached on the client for my use case (and I think many others) is with etags. I will investigate whether it is feasible to do this with wagtail-cache.
Good idea about the separate settings. Looks like the Django cache middleware uses a separate setting for this: https://docs.djangoproject.com/en/5.0/ref/settings/#cache-middleware-seconds
Etags are also a great idea! On one hand, we could probably use the wagtail revision history for this. However, things like images, static files, etc. might make this more complicated (i.e. what if a page is not changed, but one of the images on the page is changed; what if page content is not changed, but the code/template is changed?).
I don't think the revision history works for that, as you say there's the images but also page templates can include data from other pages so then you'd have to take those into account as well. Code/template changes on the other hand seem fairly easy to deal with, if the etags include some piece of information that changes on every deployment (e.g. some random number that gets generated via a manage.py command).
I feel like this is definitely a thing that cannot work out of the box and requires additional work for integrating, like user-provided functions for calculating the etag.
I am thinking the easiest way perhaps is for the cache to store the etag as a hash of the response next to the response, then when a If-None-Match request comes in it could check if there's a cache entry and if so compare the etag. The same could also be done with a timestamp for If-Modified-Since requests (using the timestamp of when the response is stored in the cache, so that when the cache is invalidated the browser will get a new version).
All good ideas. Yea it will end up being complicated to calculate etags. My philosophy is to set the cache TTL for like 24 hours. Media TTL for 7 days (usually served separately from wagtail). Even if you have 10,000 pages, they're only getting rendered once per day, and it's still a super-low server load.
If you come up with any good ideas for etags, feel free to post them here, or work up a concept pull request. I'd be happy to take a look.
In the mean time, we should either update the documentation about the timeout, or use a separate setting.
Yeah I'm not really worried about server load, I might just be over-optimising a bit :D
A separate setting would really be ideal.
But I will look into the etags, because I'm really curious about how to make that work in a sensible way.
I am noticing that my browser does not actually seem to cache the page responses, only resources. Presumably this is mainly relevant for caching proxies like varnish?
Working on #71 has let me to realise that there is an issue with the Cache-Control
and Expires
headers.
Since these headers are set before adding the response to the cache, and then not set again after retrieving it, a response that is served from the cache will have an Expires
header that does not match the max-age
in the Cache-Control
header.
Without #71, this means that when the cache timeout is set to e.g. 24 hours, then a request that comes in 23 hours after a matching response was cached will get that cached response with an Expires
header that indicates that the response expires already one hour later, but with a Cache-Control
header that indicates that the response is valid for another 24 hours. When a max-age
in Cache-Control
is present then clients should ignore the Expires
header, so this will lead to the client caching the response for 24 hours. Therefore this response in the client is still considered valid 47 hours after it was initially cached, despite the cache timeout being set to 24 hours.
With the cache timeout and max-age
being decoupled, this might be less of an issue since a lower max-age
can be used, but the discrepancy between Expires
and max-age
remains.
Perhaps the FetchFromCacheMiddleware
should adjust the Expires
header on a retrieved response?
Oh it seems that is what the Age
header is for, so that is probably what should be set.