caddyserver/cache-handler

Cannot match current iteration key ETag

jpds opened this issue · 9 comments

jpds commented

I have this plugin configured as follows:

{
	on_demand_tls {
		ask http://localhost:3903/check
		interval 2m
		burst 5
	}

	order cache before rewrite

	cache {
		allowed_http_verbs GET
		default_cache_control public
		nuts {
			path /var/db/caddy/souin/
		}
		ttl 8h
	}
}

https:// {
	cache {
		timeout {
			backend 30s
		}
	}

	tls {
		on_demand
	}

	reverse_proxy localhost:3902
}

However, every time I hit one of the domains fronted by this configuration, I see this in the logs:

$ curl -v https://www.domain.com/
...
 TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< accept-ranges: bytes
< alt-svc: h3=":443"; ma=2592000
< cache-control: public
< cache-status: Souin; fwd=uri-miss; stored; key=GET-https-www.domain.com-/
< content-security-policy: default-src 'self';
< content-type: text/html
< date: Sun, 30 Jul 2023 22:46:54 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
...

And the ETag never changes between curl runs, however in the logs I then see:

{"level":"info","ts":1690757113.7676728,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757115.2038972,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757116.2018719,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757117.201102,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757118.0917957,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}

Hello @jpds thank you for this feedback and the detailed issue.
AFAIK we're not handling the case the stored response contains an ETag and the client doesn't send one. We will fix that asap.

jpds commented

I built the linked PR with:

$ xcaddy build v2.7.3 --with github.com/darkweak/souin@7cc8ae262d11cc1788e0f2c5974af40d717452d8 --with github.com/caddyserver/cache-handle
...
go: added github.com/darkweak/souin v1.6.40-0.20230810001327-7cc8ae262d11
...
go: added github.com/caddyserver/cache-handler v0.8.0

However, the ETag still does not seem to be cached, I tripled tested this with Firefox/Chromium/curl:

{"level":"debug","ts":1691685939.2154143,"logger":"http.handlers.cache","msg":"Incomming request &{Method:GET URL:/ Proto:HTTP/3.0 ProtoMajor:3 ProtoMinor:0 Header:map[Accept:[text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8] Accept-Encoding:[gzip, deflate, br] Accept-Language:[en-US,en;q=0.5] Alt-Used:[www.domain.com] If-Modified-Since:[Thu, 02 Feb 2023 12:56:22 GMT] If-None-Match:[\"636849236767b0deb94dd80d04c5bc74\"] Priority:[u=1] Sec-Fetch-Dest:[document] Sec-Fetch-Mode:[navigate] Sec-Fetch-Site:[none] Sec-Fetch-User:[?1] Upgrade-Insecure-Requests:[1] User-Agent:[Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0]] Body:0x8656ce1e0 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:[2a0c:XXXX:XX::XXXX]:59406 RequestURI:/ TLS:0x864de7b80 Cancel:<nil> Response:<nil> ctx:0x86561b560}"}
{"level":"debug","ts":1691685939.2157462,"logger":"http.handlers.cache","msg":"Request cache-control &{MaxAge:-1 MaxStale:-1 MaxStaleSet:false MinFresh:-1 NoCache:false NoStore:false NoTransform:false OnlyIfCached:false StaleIfError:0 Extensions:[]}"}
{"level":"debug","ts":1691685939.215986,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"debug","ts":1691685939.2161043,"logger":"http.handlers.cache","msg":"Request the upstream server"}
{"level":"debug","ts":1691685939.2208962,"logger":"http.handlers.reverse_proxy","msg":"selected upstream","dial":"localhost:3902","total_upstreams":1}
{"level":"debug","ts":1691685939.2755642,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"localhost:3902","duration":0.054328866,"request":{"remote_ip":"2a0c:XXXX:XX::XXXX","remote_port":"59406","client_ip":"2a0c:XXXX:XX::XXXX","proto":"HTTP/3.0","method":"GET","host":"www.domain.com","uri":"/","headers":{"Date":["Thu, 10 Aug 2023 16:45:39 UTC"],"Priority":["u=1"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"],"Sec-Fetch-Site":["none"],"X-Forwarded-Proto":["https"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-User":["?1"],"Accept-Encoding":["gzip, deflate, br"],"Sec-Fetch-Dest":["document"],"X-Forwarded-Host":["www.domain.com"],"Sec-Fetch-Mode":["navigate"],"If-None-Match":["\"636849236767b0deb94dd80d04c5bc74\""],"If-Modified-Since":["Thu, 02 Feb 2023 12:56:22 GMT"],"Accept-Language":["en-US,en;q=0.5"],"Alt-Used":["www.domain.com"],"X-Forwarded-For":["2a0c:XXXX:XX::XXXX"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"]},"tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h3","server_name":"www.domain.com"}},"headers":{"Date":["Thu, 10 Aug 2023 16:45:39 GMT"]},"status":304}

curl always returns:

< cache-control: public, max-age=86400
< cache-status: Souin; fwd=uri-miss; stored; key=GET-https-www.domain.com-/
< date: Thu, 10 Aug 2023 16:55:39 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT

@jpds I updated the codebase in the PR and added a unit test about this feature. 🤞

jpds commented

@darkweak Thank you very much for the fix. I can confirm that the logs say:

{"level":"debug","ts":1691712666.6366026,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ matched the current iteration key ETag GET-https-www.domain.com-/"}

However, every request now seems to request revalidation:

< cache-control: public, max-age=86400
< cache-status: Souin; fwd=request; fwd-status=200; key=GET-https-www.domain.com-/; detail=REQUEST-REVALIDATION
< content-type: text/html
< date: Fri, 11 Aug 2023 00:38:44 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT

Is there a configuration option I'm missing?

Edit: I've also just discovered that the backend I'm using also does caching and here's the logic they use: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/67b1457c77d930cd12f83f5cf07ec7f2376fa00c/src/api/s3/get.rs#L52-L89

@jpds I updated the debug message to display the current validator (that contains the ETagResponse and the ETagRequest) with that you'll be able to find if there is an ETag mismatch.

jpds commented

None of my client requests will be sending ETags, I believe - here's the new log output:

{"level":"debug","ts":1691742614.1066208,"logger":"http.handlers.cache","msg":"Incomming request &{Method:GET URL:/ Proto:HTTP/2.0 ProtoMajor:2 ProtoMinor:0 Header:map[Accept:[*/*] User-Agent: [curl/8.0.1]] Body:0x8743a4210 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:[2a0c:XXXX:XX::XXXX]:55244 RequestURI:/ TLS:0x853eea4d0 Cancel:<nil> Response:<nil> ctx:0x874332ed0}"}
{"level":"debug","ts":1691742614.106806,"logger":"http.handlers.cache","msg":"Request cache-control &{MaxAge:-1 MaxStale:-1 MaxStaleSet:false MinFresh:-1 NoCache:false NoStore:false NoTransform:false OnlyIfCached:false StaleIfError:0 Extensions:[]}"}
{"level":"debug","ts":1691742614.1069987,"logger":"http.handlers.cache","msg":"The stored key GET-https-www.domain.com-/ matched the current iteration key ETag &{Matched:true IfNoneMatchPresent:false IfMatchPresent:false IfModifiedSincePresent:false IfUnmodifiedSincePresent:false IfUnmotModifiedSincePresent:false NeedRevalidation:true IfModifiedSince:0001-01-01 00:00:00 +0000 UTC IfUnmodifiedSince:0001-01-01 00:00:00 +0000 UTC IfNoneMatch:[] IfMatch:[] RequestETag: ResponseETag:\"636849236767b0deb94dd80d04c5bc74\"}"}
{"level":"debug","ts":1691742614.1070535,"logger":"http.handlers.cache","msg":"Revalidate the request with the upstream server"}
{"level":"debug","ts":1691742614.1091118,"logger":"http.handlers.reverse_proxy","msg":"selected upstream","dial":"localhost:3902","total_upstreams":1}
{"level":"debug","ts":1691742614.1201108,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"localhost:3902","duration":0.010700285,"request":{"remote_ip":"2a0c:XXXX:XX::XXXX","remote_port":"55244","client_ip":"2a0c:XXXX:XX::XXXX","proto":"HTTP/2.0","method":"GET","host":"www.domain.com","uri":"/","headers":{"User-Agent":["curl/8.0.1"],"Accept":["*/*"],"Date":["Fri, 11 Aug 2023 08:30:14 UTC"],"Etag":["\"636849236767b0deb94dd80d04c5bc74\""],"X-Forwarded-For":["2a0c:XXXX:XX::XXXX"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["www.domain.com"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"www.domain.com"}},"headers":{"Accept-Ranges":["bytes"],"Etag":["\"636849236767b0deb94dd80d04c5bc74\""],"Content-Length":["777"],"Date":["Fri, 11 Aug 2023 08:30:14 GMT"],"Content-Type":["text/html"],"Last-Modified":["Thu, 02 Feb 2023 12:56:22 GMT"]},"status":200}
{"level":"debug","ts":1691742614.1225867,"logger":"http.handlers.cache","msg":"Response cache-control &{MustRevalidate:false NoCache:map[] NoCachePresent:false NoStore:false NoTransform:false Public:true Private:map[] PrivatePresent:false ProxyRevalidate:false MaxAge:86400 SMaxAge:-1 Immutable:false StaleIfError:-1 StaleWhileRevalidate:-1 Extensions:[]}"}
{"level":"debug","ts":1691742614.1228776,"logger":"http.handlers.cache","msg":"Store the response {Status: StatusCode:200 Proto: ProtoMajor:0 ProtoMinor:0 Header:map[Accept-Ranges:[bytes] Alt-Svc:[h3=\":443\"; ma=2592000] Cache-Control:[public, max-age=86400] Content-Length:[777] Content-Security-Policy:[default-src 'self';] Content-Type:[text/html] Date:[Fri, 11 Aug 2023 08:30:14 GMT] Etag:[\"636849236767b0deb94dd80d04c5bc74\"] Last-Modified:[Thu, 02 Feb 2023 12:56:22 GMT] Permissions-Policy:[interest-cohort=()] Referrer-Policy:[no-referrer-when-downgrade] Server:[Caddy] Strict-Transport-Security:[max-age=31536000;] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Souin-Stored-Ttl:[24h0m0s]] Body:{Reader:...

All the ETags involved are coming from the backend server - hence I was wondering if there was some mismatched expectations in what the backend is proposing to Souin.

Edit: and the backend in question is a static S3 bucket with simple web content: https://garagehq.deuxfleurs.fr/documentation/cookbook/exposing-websites/ - hence I was hoping that Caddy could cache the latest copy of this content locally without having to ask the backend storage servers what was available.

jpds commented

I've been playing with curl ( https://daniel.haxx.se/blog/2019/12/06/curl-speaks-etag/ ) to try and better understand this and I've found:

With an ETag, curl bounces between REQUEST-REVALIDATION and UNCACHEABLE-STATUS-CODE:

# First attempt
$ curl -v --etag-compare etag.txt --etag-save etag.txt https://www.domain.com -o saved-file
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-none-match: ""]
* Using Stream ID: 1 (easy handle 0x55bb424d1350)
} [5 bytes data]
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-none-match: ""
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
< HTTP/2 200
< accept-ranges: bytes
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=86400
< cache-status: Souin; fwd=request; fwd-status=200; key=GET-https-www.domain.com-/; detail=REQUEST-REVALIDATION
< content-type: text/html
< date: Fri, 11 Aug 2023 10:00:35 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 777
<
{ [5 bytes data]
100   777  100   777    0     0   3870      0 --:--:-- --:--:-- --:--:--  3885

# Second attempt
$ curl -v --etag-compare etag.txt --etag-save etag.txt https://www.domain.com -o saved-file
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-none-match: "636849236767b0deb94dd80d04c5bc74"]
* Using Stream ID: 1 (easy handle 0x55bdb2de0350)
} [5 bytes data]
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-none-match: "636849236767b0deb94dd80d04c5bc74"
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
< HTTP/2 304
< alt-svc: h3=":443"; ma=2592000
< cache-status: Souin; fwd=uri-miss; key=GET-https-www.domain.com-/; detail=UNCACHEABLE-STATUS-CODE
< date: Fri, 11 Aug 2023 10:07:37 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 0
<
{ [0 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

If I use the older if-modified-since method, I then see Souin hits:

$ curl -v -z "11 Aug 2023" https://www.domain.com
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-modified-since: Fri, 11 Aug 2023 00:00:00 GMT]
* Using Stream ID: 1 (easy handle 0x563f2a52c350)
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-modified-since: Fri, 11 Aug 2023 00:00:00 GMT
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< accept-ranges: bytes
< age: 126
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=86400
< cache-status: Souin; hit; ttl=86274; key=GET-https-www.domain.com-/
< content-type: text/html
< date: Fri, 11 Aug 2023 10:07:35 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 777
<
* The requested document is not new enough
* Simulate an HTTP 304 response

Hey @jpds sorry for the delay.
I just updated the PR, let me know if there are other weird behaviours.

jpds commented

@darkweak Now it is all working - perfectly 🎉 . Seeing hits everywhere.