postgresql.write_recovery_conf({"address": "postgres://169.0.0.1:5432"})
scalp42 opened this issue · 2 comments
Hi there @Winslett ,
I'm curious about this line:
postgresql.write_recovery_conf({"address": "postgres://169.0.0.1:5432"})
I assume it should get the current leader from etcd instead of being hardcoded.
What is the context around this please ?
Thanks in advance!
TL;DR: it solves the problem of an unknown leader.
When a stopped PostgreSQL with files in the data directory is started, it is started as a secondary without an accessible leader. I chose 169.0.0.1
because I was thinking it was the "Automatic Private IP Addressing" space , but I was wrong -- looks like I should have chosen 169.254.x.y.
I originally made this change for the scenario of an initialized deployment with no running Postgres members and the etcd TTL had expired. In that scenario, the best situation I could come up with was:
- governor starts PostgreSQL as a secondary
- governor queries all known PostgreSQL members to determine healthiest member
- governor only tries to take TTL if it is the healthiest member
Using this method, a deployment can recover from all hosts going down.
In this governor project, there is one check I did not implement between 2 and 3 above. It could be a "is reasonably healthy". With this, store the last known xlog position for the leader in a governor for each HA loop. Then, 2.5 would be "confirm this host is within some megabyte size of the last known leader xlog position." This additional check would prevent a stale PostgreSQL from taking over as leader if it is the only member running.
After I solved for the scenario above, I never had issues with PostgreSQL coming online as a secondary. When the HA loop would run, govern would point the secondary to the proper leader. Thus, it wasn't an issue.