Prevent caching based on response header
sandros94 opened this issue · 12 comments
So, long story short I'm trying to accelerate an old Drupal 7 website, that more than 95% is anonymous user's traffic and the rest is mostly admins, editors and publishers working on the platform.
Problem is that Drupal 7 doesn't have an Authorization
header, so any logged user will be cached.
While reading #34 I though that a good solution was to only cache based on the response header, since the bulk of public content is pre-rendered by the cms and tagged accordingly via headers. But to my understanding in Caddy there is no way to run a matcher
on the response because it is already too late.
A regex based on path isn't an option, since the path of unpublished content is the same as when it would be published.
But, I also have to admit that I still don't fully understand how Souin works, in particular what are the cache keys. Because reading at this and this it almost seems that the functionality of caching only based on response headers is already there, its just I don't understand how to do it.
I think you can respond with Cache-Control: no-cache
?
I think you can respond with
Cache-Control: no-cache
?
I was indeed expecting that behavior from Drupal, but it doesn't seem consistent. I'm investigating that side as well
To give an update, so that you are able to judge, and decide to close as not planned or leave it open for potentially a future implementation (I would have liked to open up a draft PR, but I'm not yet that familiar with Go).
What I've found
Indeed Drupal 7 (well, I've tested only 7.97) does add Cache-Control: no-cache
to the html page for logged in users, but it also adds Cache-Control: max-age=N
to the related js for them. Making the actual page be displayed correctly, but not things like form submissions and interactivity.
In further investigation it looks to be a design philosophy compared to an issue: since Drupal is essentially creating an internal, volatile, cache for each user's js ready to be delivered and not regenerating it using PHP on each request (potentially making it quite an expensive task).
Huh? Users have different JS payloads sent to them? That's unusual. Typically you have a single JS bundle sent to the client (or multiple files split up, depending on your bundler config) regardless of authentication, and config is set in the HTML on window
ahead of time for the JS to read, to change how JS behaves.
That's unusual
Indeed and I think this could be down to a Drupal config that aggregates multiple CSS/JS files together. The issue is definitely in the AJAX/jQuery and how they are bundled, but I'm more interested in seeing what a more modern version like Drupal 10 do (I should also look at modules too).
@sandros94 does the frontend send a SESSION
cookie?
@sandros94 does the frontend send a
SESSION
cookie?
It doesn't. To my (limited) understanding in Drupal 7 the session is fully handled server-side and it removes any indication of that particular session from headers and cookies. This feels so strange and unusual (I'm used to Vue and mostly do things myself).
I still need to check if the settings that pre-compresses and aggregates both css and js could be the issue with mixing up anonymous and authenticated sessions when using any external cache (since now I could simply use caddy to compress on request)
the session is fully handled server-side and it removes any indication of that particular session from headers and cookies
That's wrong. Cookies are used to identify a session in the backend storage. Otherwise the server would have no idea who the client is.
Sometimes it's a cookie like PHPSESSID
or whatever. It depends on the framework etc.
Drupal adds a session cookie in the browser for authenticated users (e.g. SESS49960de5880e8c687434170f6476605b
). So you can use the caddy matcher to detect if the request has a cookie with the prefix SESS*, if true, don't use cache, use cache otherwise.
Maybe something like that would work:
@authCookie `header({'Cookie':'*SESS*'})`
route @authCookie {
# only php_fastcgi
}
route {
cache
php_fastcgi
}
Or simpler:
@noAuthCookie not header Cookie *SESS*
cache @noAuthCookie
So cache is only used when there's no session cookie.
AH! Thank you so much for the explanation!
Quickly found the SESS
cookie and I'll test the suggested config soon, I'll update.
UPDATE 1
This made me understand a bit more how Drupal 7.97 works.
Long story short: the suggested config, at least in a first test, does seem to have solved the content editing problem I was facing in the beginning (forms not posting/saving, images not uploading).
What I also understood is that Drupal 7 doesn't regenerate the css and js on a per-user basis but on a per-cache cycle. Each css and js share the same query string based on the current cache cycle, (to be used as a cache buster
).
This is further increased when using aggregated content (under `/admin/config/development/performance'), where each group
of css/js becomes a multi-string (a combination of current page and cache cycle). Increasing the possibility of a mismatch/unavailability of the resources compared to the ones specified in the first html (sometimes rendering a broken css). And when this mismatch happens (the frontend receives a request for only a sub-group and not the full page load) Drupal tries to regenerate those caches and most often than not it provides an unfinished one, corrupting the resources displayed.
Next step I need to understand if this Drupal version/project can even have a cache/cdn in front.
@sandros94 we're closing this issue, reopen it if needed.