superfly/litefs

When using proxy, first http request (triggering fly machine startup) returns connection refused

Closed this issue ยท 4 comments

I'm using litefs on fly.io. When my app has all machines stopped, and I send a GET request for /, instead of the expected output, I receive a page

Proxy error: dial tcp 127.0.0.1:8081: connect: connection refused

I've made an example app (based on stripping down my original one) which demonstrates this: codyps/litefs-proxy-error@370bbaa

One can test either by deploying with the deploy.sh script (does some nix stuff), or by using a plain flyctl deploy --ha=false. Setup is as simple as doing a flyctl launch, accepting most defaults (just need the app created).

After starting the app, use flyctl machine stop ... to stop the machine associated with it, then issue a http get. You'll get something like this:

% curl -vv -L https://litefs-proxy-error.fly.dev/
*   Trying [2a09:8280:1::15:43ea]:443...
* Connected to litefs-proxy-error.fly.dev (2a09:8280:1::15:43ea) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=*.fly.dev
*  start date: Jun  9 23:43:51 2023 GMT
*  expire date: Sep  7 23:43:50 2023 GMT
*  subjectAltName: host "litefs-proxy-error.fly.dev" matched cert's "*.fly.dev"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* h2 [:method: GET]
* h2 [:scheme: https]
* h2 [:authority: litefs-proxy-error.fly.dev]
* h2 [:path: /]
* h2 [user-agent: curl/8.1.1]
* h2 [accept: */*]
* Using Stream ID: 1 (easy handle 0x7fd7a180ca00)
> GET / HTTP/2
> Host: litefs-proxy-error.fly.dev
> User-Agent: curl/8.1.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 502 
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< date: Sat, 22 Jul 2023 16:10:42 GMT
< content-length: 66
< server: Fly/a0b91024 (2023-06-13)
< via: 2 fly.io
< fly-request-id: 01H5Z5W51HMYASQRY9Z8AMZJXY-lga
< 
Proxy error: dial tcp 127.0.0.1:8081: connect: connection refused
* Connection #0 to host litefs-proxy-error.fly.dev left intact
%

Subsequent requests succeed.

Instead, we'd expect all requests to succeed (as occurs when not using the litefs proxy).

Some options:

  1. in litefs's proxy, wait to start the listening port until TCP connectivity to the target port succeeds. (This most accurately preserves behavior without litefs proxy wrt TCP health checks, etc)
  2. in litefs's proxy, retry initial connection refused errors when connecting to the target
  3. somehow configure http checks in fly.toml so that it waits for a successful get before passing through a request (I've poked at this a bit, but haven't succeeded. Possible I'm just messing up the fly.toml content)
  4. use systemd socket activation to pre-create the listener fd and pass it into the proxied command (#292). would only work when also using supervisor/exec functionality in litefs & when the executed command supports socket activation.

@jmesmon Thank you for the repro code. It's makes it really easy to get a fix in. I ended up retrying on ECONNREFUSED. I tested the fix manually with your repo but can you give it a quick try to make sure it's working how you're expecting it to? The PR is #368.

You can test it by just switching your Dockerfile to use the PR artifact:

COPY --from=flyio/litefs:pr-368 /usr/local/bin/litefs /usr/local/bin/litefs

@benbjohnson I have tested your PR with my app and I can no longer see that proxy error. ๐ŸŽ‰

Thanks, @zaynetro! ๐Ÿ™ I went ahead and merged and cut a new release: https://github.com/superfly/litefs/releases/tag/v0.5.2

Works for me too. Thanks.