When using proxy, first http request (triggering fly machine startup) returns connection refused
Closed this issue ยท 4 comments
I'm using litefs
on fly.io. When my app has all machines stopped, and I send a GET request for /
, instead of the expected output, I receive a page
Proxy error: dial tcp 127.0.0.1:8081: connect: connection refused
I've made an example app (based on stripping down my original one) which demonstrates this: codyps/litefs-proxy-error@370bbaa
One can test either by deploying with the deploy.sh
script (does some nix stuff), or by using a plain flyctl deploy --ha=false
. Setup is as simple as doing a flyctl launch
, accepting most defaults (just need the app created).
After starting the app, use flyctl machine stop ...
to stop the machine associated with it, then issue a http get. You'll get something like this:
% curl -vv -L https://litefs-proxy-error.fly.dev/
* Trying [2a09:8280:1::15:43ea]:443...
* Connected to litefs-proxy-error.fly.dev (2a09:8280:1::15:43ea) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
* subject: CN=*.fly.dev
* start date: Jun 9 23:43:51 2023 GMT
* expire date: Sep 7 23:43:50 2023 GMT
* subjectAltName: host "litefs-proxy-error.fly.dev" matched cert's "*.fly.dev"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
* using HTTP/2
* h2 [:method: GET]
* h2 [:scheme: https]
* h2 [:authority: litefs-proxy-error.fly.dev]
* h2 [:path: /]
* h2 [user-agent: curl/8.1.1]
* h2 [accept: */*]
* Using Stream ID: 1 (easy handle 0x7fd7a180ca00)
> GET / HTTP/2
> Host: litefs-proxy-error.fly.dev
> User-Agent: curl/8.1.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 502
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< date: Sat, 22 Jul 2023 16:10:42 GMT
< content-length: 66
< server: Fly/a0b91024 (2023-06-13)
< via: 2 fly.io
< fly-request-id: 01H5Z5W51HMYASQRY9Z8AMZJXY-lga
<
Proxy error: dial tcp 127.0.0.1:8081: connect: connection refused
* Connection #0 to host litefs-proxy-error.fly.dev left intact
%
Subsequent requests succeed.
Instead, we'd expect all requests to succeed (as occurs when not using the litefs
proxy).
Some options:
- in litefs's proxy, wait to start the listening port until TCP connectivity to the target port succeeds. (This most accurately preserves behavior without litefs proxy wrt TCP health checks, etc)
- in litefs's proxy, retry initial connection refused errors when connecting to the target
- somehow configure http checks in fly.toml so that it waits for a successful get before passing through a request (I've poked at this a bit, but haven't succeeded. Possible I'm just messing up the fly.toml content)
- use systemd socket activation to pre-create the listener fd and pass it into the proxied command (#292). would only work when also using supervisor/exec functionality in litefs & when the executed command supports socket activation.
@jmesmon Thank you for the repro code. It's makes it really easy to get a fix in. I ended up retrying on ECONNREFUSED
. I tested the fix manually with your repo but can you give it a quick try to make sure it's working how you're expecting it to? The PR is #368.
You can test it by just switching your Dockerfile to use the PR artifact:
COPY --from=flyio/litefs:pr-368 /usr/local/bin/litefs /usr/local/bin/litefs
@benbjohnson I have tested your PR with my app and I can no longer see that proxy error. ๐
Thanks, @zaynetro! ๐ I went ahead and merged and cut a new release: https://github.com/superfly/litefs/releases/tag/v0.5.2
Works for me too. Thanks.