singer-io/tap-postgres

Create (dataless) transaction to advance replication log for postgres RDS

Opened this issue · 0 comments

drdee commented

When you have a Postgres database in Amazon RDS that receives very little or no new data, the hard drive will fill-up with unreleased wal file segments. The reasons for that is (according to AWS support team):

In RDS, we write to a heartbeat table in our internal "rdsadmin" database every 5 minutes, so RDS
 instances are usually never "idle" for longer than 5 minutes, whereas PostgreSQL installed on EC2
 machine may appear "idle" indefinitely. On RDS hosts, we periodically write heartbeats to our admin 
DB on the host. As a result, even if there is no user activity, WAL files are generated. 

A similar issued affected Debezium and is discussed at https://issues.jboss.org/browse/DBZ-892?attachmentViewMode=gallery&_sscc=t

One way of fixing this is if tap-postgres could create a (dataless) transaction that advances the replication and thereby RDS releases the unused WAL files and preventing the hard drive from filling up. We have implemented our own _heartbeat table on the RDS databases and upsert the current timestamp in that table. Perhaps this is something that tap-postgres can do out-of-the-box.