compose/governor

postgresql.write_recovery_conf({"address": "postgres://169.0.0.1:5432"})

scalp42 opened this issue · 2 comments

Hi there @Winslett ,

I'm curious about this line:

postgresql.write_recovery_conf({"address": "postgres://169.0.0.1:5432"})

I assume it should get the current leader from etcd instead of being hardcoded.

What is the context around this please ?

Thanks in advance!

TL;DR: it solves the problem of an unknown leader.

When a stopped PostgreSQL with files in the data directory is started, it is started as a secondary without an accessible leader. I chose 169.0.0.1 because I was thinking it was the "Automatic Private IP Addressing" space , but I was wrong -- looks like I should have chosen 169.254.x.y.

I originally made this change for the scenario of an initialized deployment with no running Postgres members and the etcd TTL had expired. In that scenario, the best situation I could come up with was:

  1. governor starts PostgreSQL as a secondary
  2. governor queries all known PostgreSQL members to determine healthiest member
  3. governor only tries to take TTL if it is the healthiest member

Using this method, a deployment can recover from all hosts going down.

In this governor project, there is one check I did not implement between 2 and 3 above. It could be a "is reasonably healthy". With this, store the last known xlog position for the leader in a governor for each HA loop. Then, 2.5 would be "confirm this host is within some megabyte size of the last known leader xlog position." This additional check would prevent a stale PostgreSQL from taking over as leader if it is the only member running.

After I solved for the scenario above, I never had issues with PostgreSQL coming online as a secondary. When the HA loop would run, govern would point the secondary to the proper leader. Thus, it wasn't an issue.

Thanks for the answer @Winslett ! About to port all this into Chef and it's definitely useful.