Supabase Realtime does not works
mats16 opened this issue · 8 comments
次のようなエラーが出て動作しない。
2022-08-01 05:55:37.522 [error] %Postgrex.Error{connection_id: 16700, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"supabase_realtime_rls\" is active for PID 12303", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
この状態でつなぎに行くと
2022-08-01 05:56:11.784 [info] CONNECTED TO RealtimeWeb.UserSocket in 245µs
Transport: :websocket
Serializer: Phoenix.Socket.V1.JSONSerializer
Parameters: %{"apikey" => "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoiYW5vbiIsImlhdCI6MTY1ODk2NzE2NiwiZXhwIjoxOTc0NTQzMTY2LCJpc3MiOiJzdXBhYmFzZSJ9.yjbFON28xWRS-U4GhOtIvkUD0gjEr4NmKb7Kzgw-XI8", "vsn" => "1.0.0"}
2022-08-01 05:56:14.826 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
2022-08-01 05:56:14.826 [info] REFUSED JOIN realtime:public:messages in 18ms
Parameters: %{"user_token" => "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoiYW5vbiIsImlhdCI6MTY1ODk2NzE2NiwiZXhwIjoxOTc0NTQzMTY2LCJpc3MiOiJzdXBhYmFzZSJ9.yjbFON28xWRS-U4GhOtIvkUD0gjEr4NmKb7Kzgw-XI8"}
2022-08-01 05:56:14.845 [error] %Postgrex.Error{connection_id: 16700, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"supabase_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
supabase/realtime#22 (comment)
This is an ongoing problem with, for example, ECS tasks in AWS, when the new one has started before the old one is killed. The old one hasn't released the replication slot id, while the new one is trying to claim it.
ECS のローリングアップデートだと同一スロットで複数接続するので対応が必要そう
ローリングアップデートの際のエラーはコレっぽい
connection_id: 30689
で接続に行って、過去の active for PID 14208
が残ってるっぽい
2022-08-01 21:29:33.460 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"realtime_rls\" is active for PID 14208", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
この直後にコレが出てる(進捗なし)
2022-08-01 21:32:54.929 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
2022-08-01 21:32:55.058 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
Supabase 本家の権限も一応確認してみたが、見た感じ supabase_admin が Superuser で、postgres はユーザー用に用意されたメンテ用ユーザーっぽい
postgres=> \du
List of roles
Role name | Attributes | Member of
------------------------+------------------------------------------------------------+--------------------------------------------------
anon | No inheritance, Cannot login | {}
authenticated | No inheritance, Cannot login | {}
authenticator | No inheritance | {anon,authenticated,service_role,supabase_admin}
dashboard_user | Create role, Create DB, Cannot login, Replication | {}
pgbouncer | | {}
postgres | Create role, Create DB, Replication, Bypass RLS | {supabase_auth_admin,supabase_storage_admin}
service_role | No inheritance, Cannot login, Bypass RLS | {}
supabase_admin | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
supabase_auth_admin | No inheritance, Create role | {}
supabase_storage_admin | No inheritance, Create role
Ensure that max_worker_processes is at least as high as the combined values of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers. Having a high number of background worker processes might affect application workloads on small DB instance classes, so monitor the performance of your database if you set max_worker_processes higher than the default value.
今
max_worker_processes: 8
max_logical_replication_workers: 2
autovacuum_max_workers: 3
max_parallel_workers: 8
- RDS (非 Aurora) も同じエラー
- pglogical 有効化してみたけど変わらず
pg_receivewal
繋いでみて状況把握しようとするも別のエラーで出来ず。
$ /usr/pgsql-14/bin/pg_receivewal -D ./pg_receivewal/ --slot=cloud9 --create-slot -d postgresql://supabase_admin:xxxx@xxxx.us-west-2.rds.amazonaws.com/postgres
pg_receivewal:
error:
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed:
FATAL: no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", SSL encryption
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed:
FATAL: no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", no encryption
Websocker の接続の際に次のようなエラーが出てたので調査
{
"event":"phx_reply",
"payload":{
"response":{
"reason":"error occurred when joining realtime:public:messages with user token"
},
"status":"error"
},
"ref":"1",
"topic":"realtime:public:messages"
}
ローカル(Mac)で自分で作成した jwt_secret に置き換えて起動
問題なく動く。JWT のフォーマット起因では無さそう
Fargate の起動モードを x86 に変更
変わらず
Cloud9 上で docker-compose して DB は Aurora
同じエラー。異なるのは DB だけなので DB 起因??
原因特定してパッチ送った