superfly/litefs

Proxy not forwarding POST requests to primary

Closed this issue · 7 comments

Setup
NodeJS+Express backend with 3 fly machines.
Standard configuration of LiteFS described here: https://fly.io/docs/litefs/getting-started-fly/

fly.toml

[http_service]
internal_port = 8081
force_https = true
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 0

[[services]]
internal_port = 8080
protocol = "tcp"

[[services.ports]]
port = 80
handlers = ["http"]
force_https = true

[[services.ports]]
port = 443
handlers = ["tls", "http"]

[services.concurrency]
type = "requests"
hard_limit = 200
soft_limit = 150

[[services.tcp_checks]]
interval = "15s"
timeout = "2s"
grace_period = "10s"
restart_limit = 6

# https://community.fly.io/t/edge-headers/9278
[services.ports.http_options.response.headers]
Access-Control-Allow-Origin = "*"
Access-Control-Allow-Methods = "*"
Access-Control-Allow-Headers = "*"

litefs.yml

# The fuse section describes settings for the FUSE file system. This file system
# is used as a thin layer between the SQLite client in your application and the
# storage on disk. It intercepts disk writes to determine transaction boundaries
# so that those transactions can be saved and shipped to replicas.
fuse:
    dir: "/litefs"

# The data section describes settings for the internal LiteFS storage. We'll
# mount a volume to the data directory so it can be persisted across restarts.
# However, this data should not be accessed directly by the user application.
data:
    dir: "/var/lib/litefs"

# This flag ensure that LiteFS continues to run if there is an issue on starup.
# It makes it easy to ssh in and debug any issues you might be having rather
# than continually restarting on initialization failure.
exit-on-error: false

# This section defines settings for the option HTTP proxy.
# This proxy can handle primary forwarding & replica consistency
# for applications that use a single SQLite database.
proxy:
    addr: ":8080"
    target: "localhost:8081"
    db: "db"
    passthrough:
        - "*.ico"
        - "*.png"

# This section defines a list of commands to run after LiteFS has connected
# and sync'd with the cluster. You can run multiple commands but LiteFS expects
# the last command to be long-running (e.g. an application server). When the
# last command exits, LiteFS is shut down.
exec:
    # Only run migrations on candidate nodes.
    - cmd: "npm run migrate"
      if-candidate: true

    # Then run the application server on all nodes.
    - cmd: "npm run start"

lease:
  type: "consul"

  # Specifies if this node can become primary. The expression below evaluates
  # to true on nodes that are run in the primary region. Nodes in other regions
  # act as non-candidate, read-only replicas.
  candidate: ${FLY_REGION == PRIMARY_REGION}

  # If true, then the node will automatically become primary after it has
  # connected with the cluster and sync'd up. This makes it easier to run
  # migrations on start up.
  promote: true

  # The API URL that other nodes will use to connect to this node.
  advertise-url: "http://${FLY_ALLOC_ID}.vm.${FLY_APP_NAME}.internal:20202"

  consul:
    # The URL of the Consul cluster.
    url: "${FLY_CONSUL_URL}"

    # A unique key shared by all nodes in the LiteFS cluster.
    # Change this if you are running multiple clusters in a single app!
    key: "${FLY_APP_NAME}/primary"

Reproduction
Log lines below show the issue, and indicate that the POST request was successful after a few tries. ** 2023-08-13T18:06:16Z app[6e82d392c7d5e8] ** below is to indicate the replica change (OR successful write forwarding), which resulted in a success.

2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]SqliteError: disk I/O error
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at Object.create (/usr/src/app/dist/services/storyLikes.js:54:33)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at /usr/src/app/dist/routes/storyLikes.js:30:43
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/index.js:280:10)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]SqliteError: disk I/O error
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at Object.create (/usr/src/app/dist/services/storyLikes.js:54:33)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at /usr/src/app/dist/routes/storyLikes.js:30:43
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/index.js:280:10)
**2023-08-13T18:06:13Z app[32874699b19e48]** sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]SqliteError: disk I/O error
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at Object.create (/usr/src/app/dist/services/storyLikes.js:54:33)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at /usr/src/app/dist/routes/storyLikes.js:30:43
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/index.js:280:10)
**2023-08-13T18:06:16Z app[6e82d392c7d5e8]** sjc [info]{"level":30,"time":1691949976874,"pid":335,"hostname":"6e82d392c7d5e8","msg":"2a02:26f7:c344:8014:0:10da:1728:37ec, 2a09:8280:1::24:3997 Successfully liked story 853c1402-fe8a-40a4-86ad-c30dec36ad12}"}

P.S. Was not sure what other details I should be submitting here. Please let me know if there is more information I can/need to share. Thanks in advance.

@harshav17 Thanks for submitting the issue. Do you know what version of LiteFS is running? It should show when the machine starts up. We made a fix recently (#373) that changes how POSTs are handled when a primary is not known by the replica.

Thanks for responding.

0.5.2. Should I upgrade to 0.5.4 and see if the POST calls start returning 503 error codes?

@harshav17 Sorry, I missed your reply. Yes, it's worth a try with v0.5.4. Let me know if you still see an issue.

Upgrade to 0.5.4 did not help, I still see the issue.

2023-08-21T05:17:42Z app[e784e046fd6718] sjc [info]{"level":30,"time":1692595062389,"pid":335,"hostname":"e784e046fd6718","msg":"Successfully got 28 stories"}
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]SqliteError: disk I/O error
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Object.create (/usr/src/app/dist/models/storylikes.js:9:33)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Object.create (/usr/src/app/dist/services/storylikes.js:54:33)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at /usr/src/app/dist/routes/storylikes.js:30:43
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]    at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-21T05:18:19Z app[6e82d392c7d5e8] sjc [info]{"level":30,"time":1692595099276,"pid":336,"hostname":"6e82d392c7d5e8","msg":"Successfully got The Enchanted Forest Adventure story"}
2023-08-21T05:18:23Z app[6e82d392c7d5e8] sjc [info]{"level":30,"time":1692595103109,"pid":336,"hostname":"6e82d392c7d5e8","msg":"2a02:26f7:c344:8013:0:69e1:4c50:8210, 2a09:8280:1::24:3997 Successfully liked story a1205ac8-a7af-4d35-9630-c89b202bca4e}"}

Hi @harshav17

You have two different services that listen on 80/443 defined in your config:

  • http_service, that forwards to port 8081 (your app)
  • service, that forwards to port 8080 (LiteFS proxy).

I'm not sure which one takes precedence in fly-proxy in this case. Try to remove http_service part of the config to make sure that all requests are processed by LiteFS Proxy.

yep, that fixed it! Thank you both!

This is it! Wow this took me a solid 16 hours of debugging to find this solution. I couldn't figure out why 50% of my POST requests were failing (being sent to a read only replica) instead of the primary. This solved my issue. Thanks a ton.