Logical replication is not advancing if high amount of changes generated in different database
Opened this issue · 0 comments
volodymyr-mykhailyk commented
Background:
- Current code flushing changes and updates bookmark only when it encounters any messages for the replication slot.
- Postgres reading log files sequentially but responds only with messages for replication slot database
- If another database has a large amount of changes - Postgres does not have enough time to read the large log file and send messages for current replication slot that will advance LSN
Infrastructure:
Postgress instance:
- database1 -> Generates xGB of log file in short burst -> has replication_slot1 -> replicates every hour
- database2 -> Generates 2000 records every 6 hours -> has replication_slot2 -> replicates every 6 hours
Scenario:
- all of the changes are replicated from both instances and last LSN points to
10000
- database1 has a burst and generates 100Gb wal log files and advancing LSN to
20000
- database 2 adds 2000 records 1 hour after the burst and advancing LSN to
20100
- replication for database2 starts reading changes starting from
10000
LSN - in 30 minutes DB is able to read logs only up to
15000
LSN -> No replication_slot2 messages are produced - Since none of the messages where replicated - no flush has been issued and bookmark was not updated by this part of code
- Next time replication starts - it starts again from
10000
LSN
Possible Solution:
- Advance bookmark using wal_end data from cursor if there are no messages has been produced
- Trigger flush_lsn with the same wal_end value.