pg_recvlogical very unstable

Question

pg_recvlogical very unstable

Closed this issue 5 years ago · 6 comments

Hello,
We use Wal2Json as a service. It begins to create a slot if not exists and then starts to listen to wal by using pg_recvlogical —start -o include-types=0 -f -
When 2 or 3 long queries are running on the database, it seems the pg_recvlogical stuck in state = startup and can not create the slot.
Did you ever heard of this issue ?
Moreover, when we need to use the option --write-in-chunks, because pg_recvlogical can not catchup, the json is not well formated ( in fact, it is well formated few times, also no data is provided) so this option is unstable.

Invalid output format cases
Case A
Objective
Read the data from the Wal2Json.
Connection method
pg_recvlogical -h host -d database --slot slot_name -U user --start -o add-tables=test.* -o include-types=0 -f -
 Query
UPDATE test.act SET action = action WHERE id=1000000033
 Wal2Json output result
We get the following output from the Wal2Json : a valid json string.
"""
{"change":[\n
{"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gid_entity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series._.1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n
]}\n
"""

And we are able to parse it right.
Case B
Objective
Read the data from the Wal2Json in “chunk” mode.
Connection method
We add -o write-in-chunks=true
2-pg_recvlogical -h host -d database --slot slot_name -U user --start -o add-tables=test.* -o include-types=0 -o write-in-chunks=true -f -
Query
Same query as in Case A
UPDATE test.act SET action = action WHERE id=1000000033
Wal2Json output result
We get 3 differents behaviours.

Resulting behaviour 1 : Nothing
No data is provided.
Resulting behaviour 2 : Non-parsable / incomplete data

We get partial data, first :
"{"change":[\n"
 Then that :
{"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gid_entity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n
]}\n
"""
3. Resulting behaviour 3 : Parsable data
Raw output  
"""
{"change":[\n
{"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gid_entity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n
]}\n
"""
Resulting object representation : that is fine.

{#1511
+"change": array:1 [
    0 => {#1442
     +"kind": "update"
     +"schema": "test"
     +"table": "act"
     +"columnnames": array:12 [
       0 => "active"
       1 => "created_at"
       2 => "id_created_by"
       3 => "updated_at"
       4 => "id_updated_by"
       5 => "origin"
       6 => "data"
       7 => "id_brand"
       8 => "id"
       9 => "id_user"
       10 => "gid_entity"
       11 => "action"
     ]
     +"columnvalues": array:12 [
       0 => true
       1 => "2019-01-28 16:54:36.64227"
       2 => null
       3 => "2019-01-28 16:54:36.64227"
       4 => null
       5 => "LEGACY"
       6 => "{}"
       7 => "AC"
       8 => 1000000033
       9 => 6
       10 => "series.series._.1"
       11 => "test.Action.Subscribe"
     ]
     +"oldkeys": {#1497
       +"keynames": array:2 [
         0 => "id_brand"
         1 => "id"
       ]
       +"keyvalues": array:2 [
         0 => "AC"
         1 => 1000000033
       ]
}
    }
]
}
Conclusion
In this “chunk mode “, the output results seems unpredictable to us.
What is this mode output logic we can rely on?

Thank you for your help.

Answer 1 · 2019-01-31T16:02:55.000Z

When a lot of processes are running on the database, it seems the pg_recvlogical stuck in state = startup and can not create the slot.

You don't provide enough information. Do you have big transactions? Logical decoding does not behave well with big transactions (that is why I designed --write-in-chunks). What is the error message? Could you get a stack trace?

Moreover, when we need to use the option --write-in-chunks, because pg_recvlogical can not catchup, the json is not well formated ( in fact, it is well formated few times) so this option is unstable.

The option --write-in-chunks was designed to fix a memory limit with big transactions. By default, wal2json emits the whole transaction at once and it didn't work with big transaction because PostgreSQL limits memory allocation to 1 GB (issue #4 ). If you use this option, the client needs to control the JSON start/end (because it emits pieces). I hope the new format version can cover these use cases.

Answer 2 · 2019-01-31T16:08:51.000Z

So, are you going to push a new release, for a better control of json output ? :)

Answer 3 · 2019-01-31T16:12:03.000Z

Yes, I designed another format. It fixes a lot of problems that users reported over the years.

Answer 4 · 2019-01-31T16:39:45.000Z

Nice to hear that. When will you release it ?
For the PostgreSQL limits memory allocation to 1 GB, is it possible to change this limit, is it in the conf file?
thx

Answer 5 · 2019-02-03T04:22:48.000Z

I look forward to seeing the changes. Will both schemas be available at the same time? How will someone switch between them?

Answer 6 · 2019-02-04T16:06:50.000Z

@taybin see commit f81bf7a