supabase-community/supabase-on-aws

Supabase Realtime does not works

mats16 opened this issue · 8 comments

次のようなエラーが出て動作しない。

2022-08-01 05:55:37.522 [error] %Postgrex.Error{connection_id: 16700, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"supabase_realtime_rls\" is active for PID 12303", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

この状態でつなぎに行くと

2022-08-01 05:56:11.784 [info] CONNECTED TO RealtimeWeb.UserSocket in 245µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V1.JSONSerializer
  Parameters: %{"apikey" => "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoiYW5vbiIsImlhdCI6MTY1ODk2NzE2NiwiZXhwIjoxOTc0NTQzMTY2LCJpc3MiOiJzdXBhYmFzZSJ9.yjbFON28xWRS-U4GhOtIvkUD0gjEr4NmKb7Kzgw-XI8", "vsn" => "1.0.0"}

2022-08-01 05:56:14.826 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}

2022-08-01 05:56:14.826 [info] REFUSED JOIN realtime:public:messages in 18ms
  Parameters: %{"user_token" => "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoiYW5vbiIsImlhdCI6MTY1ODk2NzE2NiwiZXhwIjoxOTc0NTQzMTY2LCJpc3MiOiJzdXBhYmFzZSJ9.yjbFON28xWRS-U4GhOtIvkUD0gjEr4NmKb7Kzgw-XI8"}

2022-08-01 05:56:14.845 [error] %Postgrex.Error{connection_id: 16700, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"supabase_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

supabase/realtime#22 (comment)

This is an ongoing problem with, for example, ECS tasks in AWS, when the new one has started before the old one is killed. The old one hasn't released the replication slot id, while the new one is trying to claim it.

ECS のローリングアップデートだと同一スロットで複数接続するので対応が必要そう

ローリングアップデートの際のエラーはコレっぽい
connection_id: 30689 で接続に行って、過去の active for PID 14208 が残ってるっぽい

2022-08-01 21:29:33.460 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"realtime_rls\" is active for PID 14208", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

この直後にコレが出てる(進捗なし)

2022-08-01 21:32:54.929 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}

2022-08-01 21:32:55.058 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

Supabase 本家の権限も一応確認してみたが、見た感じ supabase_admin が Superuser で、postgres はユーザー用に用意されたメンテ用ユーザーっぽい

postgres=> \du
                                                             List of roles
       Role name        |                         Attributes                         |                    Member of
------------------------+------------------------------------------------------------+--------------------------------------------------
 anon                   | No inheritance, Cannot login                               | {}
 authenticated          | No inheritance, Cannot login                               | {}
 authenticator          | No inheritance                                             | {anon,authenticated,service_role,supabase_admin}
 dashboard_user         | Create role, Create DB, Cannot login, Replication          | {}
 pgbouncer              |                                                            | {}
 postgres               | Create role, Create DB, Replication, Bypass RLS            | {supabase_auth_admin,supabase_storage_admin}
 service_role           | No inheritance, Cannot login, Bypass RLS                   | {}
 supabase_admin         | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 supabase_auth_admin    | No inheritance, Create role                                | {}
 supabase_storage_admin | No inheritance, Create role

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Replication.Logical.html#AuroraPostgreSQL.Replication.Logical.Configure

Ensure that max_worker_processes is at least as high as the combined values of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers. Having a high number of background worker processes might affect application workloads on small DB instance classes, so monitor the performance of your database if you set max_worker_processes higher than the default value.

max_worker_processes: 8

max_logical_replication_workers: 2
autovacuum_max_workers: 3
max_parallel_workers: 8
  • RDS (非 Aurora) も同じエラー
  • pglogical 有効化してみたけど変わらず

pg_receivewal 繋いでみて状況把握しようとするも別のエラーで出来ず。

$ /usr/pgsql-14/bin/pg_receivewal -D ./pg_receivewal/ --slot=cloud9 --create-slot -d postgresql://supabase_admin:xxxx@xxxx.us-west-2.rds.amazonaws.com/postgres

pg_receivewal: 
error: 
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed: 
FATAL:  no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", SSL encryption
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed: 
FATAL:  no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", no encryption

Websocker の接続の際に次のようなエラーが出てたので調査

{
  "event":"phx_reply",
  "payload":{
    "response":{
      "reason":"error occurred when joining realtime:public:messages with user token"
    },
    "status":"error"
  },
  "ref":"1",
  "topic":"realtime:public:messages"
}

ローカル(Mac)で自分で作成した jwt_secret に置き換えて起動

問題なく動く。JWT のフォーマット起因では無さそう

Fargate の起動モードを x86 に変更

変わらず

Cloud9 上で docker-compose して DB は Aurora

同じエラー。異なるのは DB だけなので DB 起因??

supabase/realtime#270

原因特定してパッチ送った