cactus/go-camo

Handling Cloudflare Challenges with go-camo

alexzeitgeist opened this issue · 5 comments

Specifications

Version: 2.4.3-2-ga397323
Platform: Debian Buster

Expected Behavior

Many sites use Cloudflare, which can "challenge" outgoing requests from go-camo, causing issues. Ideally, we could use fallback servers with go-camo to retry these challenged requests. Often, Cloudflare challenges occur when the go-camo server IP is temporarily "blacklisted". If go-camo could pass the request to another instance on a different server with a different IP, it's more likely to avoid blacklisting and pass through Cloudflare unchallenged.

Actual Behavior

As an example, here, the image https://www.globus.ch/cf-media/akeneo/2000402238436_FP_PNG_1/1680537905/1200.png was requested; Cloudflare "challenged" the request (Cf-Mitigated: challenge, see https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#detecting-a-challenge-page-response).

Jun 06 08:18:09 proxy-host go-camo-netgo[16037]: time="2023-06-06T08:18:09.303956624-04:00" level="D" msg="signed client url" url="URL https://www.globus.ch/cf-media/akeneo/2000402238436_FP_PNG_1/1680537905/1200.png was requested."
Jun 06 08:18:09 proxy-host go-camo-netgo[16037]: time="2023-06-06T08:18:09.307941800-04:00" level="D" msg="built outgoing request" req="content_length=\"0\" transfer_encoding=\"[]\" host=\"www.globus.ch\" remote_addr=\"\" method=\"GET\" path=\"\" proto=\"HTTP/1.1\" header=\"map[Accept:[image/*] Accept-Language:[en-US,en;q=0.9,de;q=0.8] Cache-Control:[no-cache] User-Agent:[Camo Asset Proxy] Via:[Camo Asset Proxy]]\""
Jun 06 08:18:09 proxy-host go-camo-netgo[16037]: time="2023-06-06T08:18:09.358322679-04:00" level="D" msg="response from upstream" content_length="-1" header="map[Alt-Svc:[h3=\":443\"; ma=86400] Cache-Control:[private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0] Cf-Mitigated:[challenge] Cf-Ray:[7d3098a84c5600a8-CDG]Content-Type:[text/html; charset=UTF-8] Cross-Origin-Embedder-Policy:[require-corp] Cross-Origin-Opener-Policy:[same-origin] Cross-Origin-Resource-Policy:[same-origin] Date:[Tue, 06 Jun 2023 12:18:09 GMT] Expires:[Thu, 01 Jan 1970 00:00:01 GMT] Permissions-Policy:[accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()] Referrer-Policy:[same-origin] Server:[cloudflare] Strict-Transport-Security:[max-age=15552000; preload] X-Frame-Options:[SAMEORIGIN]]" proto="HTTP/1.1" status="403" transfer_encoding="[chunked]"

As a result, go-camo isn't able to proxy the image.

go-camo should process the 403 response from cloudflare, as a temporary error (404) sent back to the caller. I assume that is what is happening now -- if not, then that is likely a bug.

I could see optionally passing back the cloudflare Cf-Mitigated: challenge header, enabled with a cli flag option, if that behavior is desirable.

As far as trying to avoid the "blacklisting" itself, a couple of possible solutions:

  • First you can run multiple go-camo instances, and load balance across them.

  • Another option is that go-camo does support using an upstream http/https proxy, which could be useful to spread upstream request load across various upstream proxy servers. Do note that I have only tested with a single upstream http/https proxy instance (smokescreen, tinyproxy, mitmproxy, etc). It should work if the dns returns multiple IPs for the upstream proxy hostname -- likely with caveats as tcp keepalive would prefer existing connections.

    You could also use haproxy in front of a group of upstream proxy instances for load balancing... more of an infrastructure lift for that though for sure.

Aside from that, I'm not sure what else go-camo can do here.

Thoughts?

Hi,

go-camo should process the 403 response from cloudflare, as a temporary error (404) sent back to the caller. I assume that is what is happening now -- if not, then that is likely a bug.

Exactly, this is what happens. Proxied images that don't pass Cloudflare appear as 404 not found.

First you can run multiple go-camo instances, and load balance across them.

Exactly, this is perhaps the best solution, in particular if the instances are located on different networks. How difficult would it be to trigger the load balance to another instance on demand when the current instance faces a cf challenge? You probably answered this question:

I could see optionally passing back the cloudflare Cf-Mitigated: challenge header, enabled with a cli flag option, if that behavior is desirable.

I am not sure how haproxy (which you mention) or similar work, but this could potentially work: if go-camo returned a failure on a Cf-Mitigated: challenge header and instruct the load balancer to try another configured go-camo instance and so on.

Cheers,
Alex

I think something like this (patch below) on the go-camo side might work. This would pass the cf-mitigated header back to the caller, as well as return a 403 instead of a 404. A downstream proxy (haproxy, nginx, etc) could be configured to retry against another instance if it saw this combination. Not sure what else go-camo can do here without adding significant additional complexity.

I'm also hesitant to include this in the main releases of go-camo at this time, opting for simplicity for now. If more people end up running into this, it might be worth adding but under a non-default cli-flag.

In the meantime, maybe this would help, in conjunction with some type of downstream proxy configuration (like "retry-on 403"). If a different status code is desired, just replace http.StatusForbidden with whatever other status code you would want there (418, etc).

diff --git a/pkg/camo/proxy.go b/pkg/camo/proxy.go
index 552c67b..50f4960 100644
--- a/pkg/camo/proxy.go
+++ b/pkg/camo/proxy.go
@@ -306,6 +306,18 @@ func (p *Proxy) ServeHTTP(w http.ResponseWriter, req *http.Request) {
 		p.copyHeaders(&h, &resp.Header, &ValidRespHeaders)
 		w.WriteHeader(304)
 		return
+	case 403:
+		if resp.Header.Get("cf-mitigated") == "challenge" {
+			// decide what to do if response is a cloudflare challenge
+			// this example returns a 403 and passes through the cf-mitigated header
+			w.Header().Set("cf-mitigated", "challenge")
+			http.Error(w, "Forbidden", http.StatusForbidden)
+			return
+		}
+
+		// otherwise handle normally and return 404
+		http.Error(w, "Not Found", http.StatusNotFound)
+		return
 	case 404:
 		http.Error(w, "Not Found", http.StatusNotFound)
 		return

Thanks, this is great! I will see how I can use your patch and report back.

closing for now. reopen if appropriate.