Proxy not forwarding POST requests to primary
Closed this issue · 7 comments
Setup
NodeJS+Express backend with 3 fly machines.
Standard configuration of LiteFS described here: https://fly.io/docs/litefs/getting-started-fly/
fly.toml
[http_service]
internal_port = 8081
force_https = true
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 0
[[services]]
internal_port = 8080
protocol = "tcp"
[[services.ports]]
port = 80
handlers = ["http"]
force_https = true
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[services.concurrency]
type = "requests"
hard_limit = 200
soft_limit = 150
[[services.tcp_checks]]
interval = "15s"
timeout = "2s"
grace_period = "10s"
restart_limit = 6
# https://community.fly.io/t/edge-headers/9278
[services.ports.http_options.response.headers]
Access-Control-Allow-Origin = "*"
Access-Control-Allow-Methods = "*"
Access-Control-Allow-Headers = "*"
litefs.yml
# The fuse section describes settings for the FUSE file system. This file system
# is used as a thin layer between the SQLite client in your application and the
# storage on disk. It intercepts disk writes to determine transaction boundaries
# so that those transactions can be saved and shipped to replicas.
fuse:
dir: "/litefs"
# The data section describes settings for the internal LiteFS storage. We'll
# mount a volume to the data directory so it can be persisted across restarts.
# However, this data should not be accessed directly by the user application.
data:
dir: "/var/lib/litefs"
# This flag ensure that LiteFS continues to run if there is an issue on starup.
# It makes it easy to ssh in and debug any issues you might be having rather
# than continually restarting on initialization failure.
exit-on-error: false
# This section defines settings for the option HTTP proxy.
# This proxy can handle primary forwarding & replica consistency
# for applications that use a single SQLite database.
proxy:
addr: ":8080"
target: "localhost:8081"
db: "db"
passthrough:
- "*.ico"
- "*.png"
# This section defines a list of commands to run after LiteFS has connected
# and sync'd with the cluster. You can run multiple commands but LiteFS expects
# the last command to be long-running (e.g. an application server). When the
# last command exits, LiteFS is shut down.
exec:
# Only run migrations on candidate nodes.
- cmd: "npm run migrate"
if-candidate: true
# Then run the application server on all nodes.
- cmd: "npm run start"
lease:
type: "consul"
# Specifies if this node can become primary. The expression below evaluates
# to true on nodes that are run in the primary region. Nodes in other regions
# act as non-candidate, read-only replicas.
candidate: ${FLY_REGION == PRIMARY_REGION}
# If true, then the node will automatically become primary after it has
# connected with the cluster and sync'd up. This makes it easier to run
# migrations on start up.
promote: true
# The API URL that other nodes will use to connect to this node.
advertise-url: "http://${FLY_ALLOC_ID}.vm.${FLY_APP_NAME}.internal:20202"
consul:
# The URL of the Consul cluster.
url: "${FLY_CONSUL_URL}"
# A unique key shared by all nodes in the LiteFS cluster.
# Change this if you are running multiple clusters in a single app!
key: "${FLY_APP_NAME}/primary"
Reproduction
Log lines below show the issue, and indicate that the POST request was successful after a few tries. ** 2023-08-13T18:06:16Z app[6e82d392c7d5e8] ** below is to indicate the replica change (OR successful write forwarding), which resulted in a success.
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info]SqliteError: disk I/O error
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at Object.create (/usr/src/app/dist/services/storyLikes.js:54:33)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at /usr/src/app/dist/routes/storyLikes.js:30:43
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-13T18:06:10Z app[32874699b19e48] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/index.js:280:10)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info]SqliteError: disk I/O error
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at Object.create (/usr/src/app/dist/services/storyLikes.js:54:33)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at /usr/src/app/dist/routes/storyLikes.js:30:43
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-13T18:06:12Z app[e784e046fd6718] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/index.js:280:10)
**2023-08-13T18:06:13Z app[32874699b19e48]** sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info]SqliteError: disk I/O error
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at Object.create (/usr/src/app/dist/services/storyLikes.js:54:33)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at /usr/src/app/dist/routes/storyLikes.js:30:43
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-13T18:06:13Z app[32874699b19e48] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/index.js:280:10)
**2023-08-13T18:06:16Z app[6e82d392c7d5e8]** sjc [info]{"level":30,"time":1691949976874,"pid":335,"hostname":"6e82d392c7d5e8","msg":"2a02:26f7:c344:8014:0:10da:1728:37ec, 2a09:8280:1::24:3997 Successfully liked story 853c1402-fe8a-40a4-86ad-c30dec36ad12}"}
P.S. Was not sure what other details I should be submitting here. Please let me know if there is more information I can/need to share. Thanks in advance.
@harshav17 Thanks for submitting the issue. Do you know what version of LiteFS is running? It should show when the machine starts up. We made a fix recently (#373) that changes how POSTs are handled when a primary is not known by the replica.
Thanks for responding.
0.5.2. Should I upgrade to 0.5.4 and see if the POST calls start returning 503 error codes?
@harshav17 Sorry, I missed your reply. Yes, it's worth a try with v0.5.4. Let me know if you still see an issue.
Upgrade to 0.5.4 did not help, I still see the issue.
2023-08-21T05:17:42Z app[e784e046fd6718] sjc [info]{"level":30,"time":1692595062389,"pid":335,"hostname":"e784e046fd6718","msg":"Successfully got 28 stories"}
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]level=INFO msg="fuse: write(): wal error: read only replica"
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]Error while adding story likes SqliteError: disk I/O error}
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info]SqliteError: disk I/O error
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Object.run (/usr/src/app/dist/db.js:7:28)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Object.create (/usr/src/app/dist/models/storylikes.js:9:33)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Object.create (/usr/src/app/dist/services/storylikes.js:54:33)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at /usr/src/app/dist/routes/storylikes.js:30:43
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at next (/usr/src/app/node_modules/express/lib/router/route.js:144:13)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:114:3)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at /usr/src/app/node_modules/express/lib/router/index.js:284:15
2023-08-21T05:17:51Z app[e784e046fd6718] sjc [info] at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:346:12)
2023-08-21T05:18:19Z app[6e82d392c7d5e8] sjc [info]{"level":30,"time":1692595099276,"pid":336,"hostname":"6e82d392c7d5e8","msg":"Successfully got The Enchanted Forest Adventure story"}
2023-08-21T05:18:23Z app[6e82d392c7d5e8] sjc [info]{"level":30,"time":1692595103109,"pid":336,"hostname":"6e82d392c7d5e8","msg":"2a02:26f7:c344:8013:0:69e1:4c50:8210, 2a09:8280:1::24:3997 Successfully liked story a1205ac8-a7af-4d35-9630-c89b202bca4e}"}
Hi @harshav17
You have two different services that listen on 80/443 defined in your config:
http_service
, that forwards to port 8081 (your app)service
, that forwards to port 8080 (LiteFS proxy).
I'm not sure which one takes precedence in fly-proxy in this case. Try to remove http_service
part of the config to make sure that all requests are processed by LiteFS Proxy.
yep, that fixed it! Thank you both!
This is it! Wow this took me a solid 16 hours of debugging to find this solution. I couldn't figure out why 50% of my POST requests were failing (being sent to a read only replica) instead of the primary. This solved my issue. Thanks a ton.