Not works with Amazon Aurora
mats16 opened this issue · 16 comments
Bug report
Describe the bug
The supabase-realtime server does not work with Amazon Aurora PostgreSQL.
To Reproduce
- Change parameters for replications
parameters: {
'rds.logical_replication': '1',
'max_replication_slots': '5',
'max_wal_senders': '10',
'max_logical_replication_workers': '2',
'max_slot_wal_keep_size': '1024',
},
-
Create PostgreSQL Cluster (v14.3)
-
Launch supabase-realtime server and connect DB
-
The following log is displayed
2022-08-02 00:41:28.315 [info] Running RealtimeWeb.Endpoint with cowboy 2.8.0 at :::4000 (http)
2022-08-02 00:41:28.316 [info] Access RealtimeWeb.Endpoint at http://localhost:4000
2022-08-02 00:42:28.341 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
2022-08-02 00:42:28.377 [error] %Postgrex.Error{connection_id: 16419, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
Expected behavior
It works well, because it does not rely on extensions.
Screenshots
N/A
System information
- Version of supabase-realtime: v0.22.7
- Version of PostgreSQL: 14.3
Additional context
Add any other context about the problem here.
@mats16 did you make sure you have everything configured according to instructions found here: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Replication.Logical.html#AuroraPostgreSQL.Replication.Logical.Configure
Thanks! I already checked it...
I had created supabase_admin as rds_superuser, but it does not work.
Thanks.
This is my settings.
ParameterGroup (for Amazon Aurora PostgreSQL 14):
const parameterGroup = new rds.ParameterGroup(scope, 'ParameterGroup', {
engine: rds.DatabaseClusterEngine.auroraPostgres({ version: rds.AuroraPostgresEngineVersion.of('14.3', '14') }),
parameters: {
'rds.logical_replication': '1',
'max_replication_slots': '5', // Default Aurora:20, Supabase:5
'max_wal_senders': '10', // Default Aurora:20, Supabase:10
'max_logical_replication_workers': '2',
'autovacuum_max_workers': 'GREATEST({DBInstanceClassMemory/64371566592},2)', // Default: GREATEST({DBInstanceClassMemory/64371566592},3)
'max_parallel_workers': '4', // Default: GREATEST(${DBInstanceVCPU/2},8)
'max_slot_wal_keep_size': '1024', // https://github.com/supabase/realtime
},
});
Realtime server config as docker-compose.yml:
version: '3'
services:
realtime:
image: supabase/realtime:v0.22.7
ports:
- 4000:4000
environment:
DB_HOST: "xxxxx.cluster-xxxxx.us-west-2.rds.amazonaws.com"
DB_NAME: postgres
DB_USER: supabase_admin # it is rds_superuser
DB_PASSWORD: "xxxx"
DB_PORT: 5432
DB_SSL: "false"
PORT: 4000
JWT_SECRET: "xxx"
REPLICATION_MODE: RLS
REPLICATION_POLL_INTERVAL: 100
SECURE_CHANNELS: "false"
SLOT_NAME: realtime_rls
TEMPORARY_SLOT: "true"
MAX_REPLICATION_LAG_MB: 1000
command: >
bash -c "./prod/rel/realtime/bin/realtime eval Realtime.Release.migrate
&& ./prod/rel/realtime/bin/realtime start"
Test CDK project: https://github.com/mats16/supabase-on-aws
@mats16 can you take a look at https://github.com/supabase/realtime/blob/master/dev/postgres/mnt/00-setup.sql and make sure you have everything like a realtime
schema, a publication called supabase_realtime
, etc.?
@w3b6x9 My referrence is https://github.com/supabase/supabase/blob/master/docker/volumes/db/init/00-initial-schema.sql, so I have already created schema and publication.
I noticed that logs published at intervals of the value of SUBSCRIPTION_SYNC_INTERVAL.
Logs about SUBSCRIPTION_SYNC_INTERVAL: 10000
environment-realtime-1 | 2022-08-07 07:25:39.301 [info] Migrations already up
environment-realtime-1 | 2022-08-07 07:25:46.679 [info] Running RealtimeWeb.Endpoint with cowboy 2.8.0 at :::4000 (http)
environment-realtime-1 | 2022-08-07 07:25:46.680 [info] Access RealtimeWeb.Endpoint at http://localhost:4000
environment-realtime-1 | 2022-08-07 07:25:49.829 [info] tzdata release in place is from a file last modified Wed, 21 Oct 2020 18:40:20 GMT. Release file on server was last modified Wed, 16 Mar 2022 13:36:02 GMT.
environment-realtime-1 | 2022-08-07 07:25:51.029 [info] Tzdata has updated the release from 2020d to 2022a
environment-realtime-1 | 2022-08-07 07:25:56.696 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:25:56.939 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:26:06.711 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:26:07.276 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:26:16.727 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:26:17.667 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:26:26.744 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:26:27.078 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:26:36.760 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:26:37.447 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:26:46.774 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:26:46.858 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:26:56.791 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:26:57.220 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1 | 2022-08-07 07:27:06.806 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1 | 2022-08-07 07:27:07.555 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
Which query is to confirm connected subscribers and the tables?
I could not find it, also I am not good with Elixir.
@mats16 do you see a realtime.schema_migrations
table in your database? If you do, what's the latest migration you see?
@mats16 This should be the culprit:
.I do want to back up and mention that SECURE_CHANNELS
is required to be true
when REPLICATION_MODE
is RLS
. If you don't care about RLS and just want to get it working for the time being you should try it with REPLICATION_MODE
set to STREAM
.
Otherwise, make sure that you set SECURE_CHANNELS
to true
, sign a JWT with your JWT secret and make sure there's a role
key in your claims
. The role points to any existing db role (e.g. supabase_admin
). Have you added any tables to the supabase_realtime
publication yet? If not, you should do that as soon as possible.
@w3b6x9 thanks.
I connected in stream mode and it worked.
I also noticed the following error about web socket connection and no record existed in realtime.subscription
table.
{
"event":"phx_reply",
"payload":{
"response":{
"reason":"error occurred when joining realtime:public:messages with user token"
},
"status":"error"
},
"ref":"1",
"topic":"realtime:public:messages"
}
It is probably error related pgjwt or permissions.
We cannot use pgjwt on RDS, so we need to install it via SQL. If functions such as verify
work in extensions
schema, there is no problem, right?
(Also, sorry for creating the issue as a bug.)
@mats16 ah, that's my bad. We've had a few versions of Realtime and the latest one no longer respects the SECURE_CHANNELS
flag. You will have to set it to true
, generate a JWT (you can use an online tool for convenience) with your JWT_SECRET
(whatever xxx
actually is), pass that in, and you should be fine.
Realtime doesn't use the pgjwt
extension. JWT verification and validation is done on the Elixir side.
Also, sorry for creating the issue as a bug.
No worries, just want to make sure you're unblocked as soon as possible.
@w3b6x9 Thanks
I think I have found the root cause.
I confirmed that Aurora had received a query select rolname from pg_authid
.
https://github.com/supabase/realtime/blob/master/server/lib/realtime/rls/subscriptions/subscriptions.ex#L52
But we can not have permission to pg_authid
table on Amazon Aurora/RDS, because rds_superuser
role is not real SUPERUSER
role.
Is it possible to change using pg_authid
to pg_roles
(or pg_user
)?
I feel it is appropriate for security purposes not only for Amazon Aurora. Because pg_authid
contains a password. (although it is hashed)
Is it possible to change using
pg_authid
topg_roles
(orpg_user
)?
@mats16 you're right, switching to pg_roles
is better for security as it blanks the password. Would you like to do the honors and submit a PR? It should be as simple as replacing pg_authid
for pg_roles
.