compose/governor

Fatal: requested timeline 8 is not a child of this server's history

tvb opened this issue · 3 comments

tvb commented

Did some startup and shutdown checks, eventually landed in the following state

LOG:  database system was shut down in recovery at 2015-08-27 11:58:44 CEST
WARNING:  recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command
HINT:  The database server will regularly poll the pg_xlog subdirectory to check for files placed there.
LOG:  entering standby mode
FATAL:  requested timeline 8 is not a child of this server's history
DETAIL:  Latest checkpoint is at 0/15000028 on timeline 7, but in the history of the requested timeline, the server forked off from that timeline at 0/14000198.
LOG:  startup process (PID 2147) exited with exit code 1
LOG:  aborting startup due to startup process failure

Can we somehow come back from this?

tvb commented

Ok removed the files from pg_xlog/, base/ and global/. Governor is now trying to start again.

LOG:  database system was shut down in recovery at 2015-08-27 12:21:31 CEST
WARNING:  recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command
HINT:  The database server will regularly poll the pg_xlog subdirectory to check for files placed there.
LOG:  entering standby mode
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
2015-08-27 12:24:18,544 ERROR: Error communicating with Postgresql.  Will try again.
2015-08-27 12:24:18,545 INFO: None
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
Traceback (most recent call last):
  File "./governor.py", line 64, in <module>
    if postgresql.is_leader():
  File "/var/lib/postgresql/governor/helpers/postgresql.py", line 84, in is_leader
    return not self.query("SELECT pg_is_in_recovery();").fetchone()[0]
  File "/var/lib/postgresql/governor/helpers/postgresql.py", line 49, in query
    raise e
psycopg2.OperationalError: FATAL:  the database system is starting up

LOG:  received fast shutdown request
waiting for server to shut down....LOG:  shutting down
LOG:  database system is shut down
 done
server stopped
tvb commented

Ok bacially I did a clean restart by cleaning up as you recommended in #13 (comment)

But now I get:

2015-08-27 13:17:39,940 INFO: Lock owner: sql2; I am sql3
2015-08-27 13:17:39,941 INFO: does not have lock
2015-08-27 13:17:39,946 INFO: no action.  i am a secondary and i am following a leader
FATAL:  database system identifier differs between the primary and standby
DETAIL:  The primary's identifier is 6187648254339472088, the standby's identifier is 6150520899791371405.
FATAL:  database system identifier differs between the primary and standby

This happens when you do a pg_basebackup from a different Postgres cluster than you are trying to stream replication from. This can also happen if you run initdb for the member instead of running pg_basebackup from the cluster's leader.

I'm assuming you ran initdb for this Postgres directory. If you did, remove the data directory for that Postgres, and let governor run initialize the empty directory from the cluster's leader. Basically, you should start governor with Postgres uninitialized.