EnterpriseDB/repmgr

Repmgr : It automatically promotes to new master but other standby stopped

Opened this issue · 11 comments

I have an issue which I also posted in stackoverflow
https://dba.stackexchange.com/questions/276557/repmgr-it-automatically-promotes-to-new-master-but-other-standby-stopped

However, I would like to understand what happened.

So I was able to test if my automatic failover works and it did.
I terminated my primary container so my secondary container got promoted. Unfortunately, my third container stopped
here is the log
image

I'm running the official postgres docker image v10 and here is my repmgr.conf

NET_IF=`netstat -rn | awk '/^0.0.0.0/ {thif=substr($0,74,10); print thif;} /^default.*UG/ {thif=substr($0,65,10); print thif;}'`
NET_IP=`ifconfig ${NET_IF} | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'` 

HOSTNAME='postgres-'${my_node}

cat<<EOF > /etc/repmgr.conf
	node_id=${my_node}
	node_name=$HOSTNAME
	conninfo='host=${NET_IP} user=repmgr password=repmgr dbname=repmgr connect_timeout=2'
	data_directory='${PGDATA}'

	log_level=INFO
	log_facility=STDERR
	log_status_interval=300
	
	pg_bindir='/usr/lib/postgresql/10/bin'
	use_replication_slots=1
	
	failover=automatic
	promote_command='repmgr standby promote'
	follow_command='repmgr standby follow -W'
EOF

I also tried adding this

#	service_start_command='pg_ctl -D ${PGDATA} start'
#	service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
#	service_reload_command='pg_ctl -D ${PGDATA} reload'
#service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

but same result.

Hope someone could help me on this.
Thanks,

At this point we haven't made any particular provision for repmgr to run in Docker, so it's possible there may be issues of one kind or another.

I also tried adding this

#	service_start_command='pg_ctl -D ${PGDATA} start'
#	service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
#	service_reload_command='pg_ctl -D ${PGDATA} reload'
#service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

but same result.

Did you try adding these items without the leading #? I.e.

service_start_command='pg_ctl -D ${PGDATA} start'
service_stop_command='pg_ctl -D ${PGDATA} stop -m fast'
service_reload_command='pg_ctl -D ${PGDATA} reload'
service_restart_command='pg_ctl -D ${PGDATA} restart -m fast'

By default, when restarting a node for a standby follow operation, repmgr will stop then start the server using pg_ctl, as pg_ctl restart has proven to be problematic in some environments. However the opposite might be the case here. Either way we strongly recommend using the OS level service commands where available to avoid issues like this (not sure if those would be available here).

Also I see from the Stackoverflow post you're using repmgr 5.0; we strongly recommend using repmgr 5.1, the latest version.

@ibarwick yes i tried using without '#'

for the repmgr here is how i download it

RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main 10" \
          >> /etc/apt/sources.list.d/pgdg.list
RUN apt-get update; apt-get install -y postgresql-10-repmgr repmgr-common

Could you please help me? where can i download it?

for the 5.1 version?. I assume the commands would be the same for repmgr it's just the version we are changing

anyway i found it.

RUN curl https://dl.2ndquadrant.com/default/release/get/deb | bash
RUN apt-get update && apt-get install postgresql-11-repmgr repmgr-common -y

i'll try the changes you recommend and get back to you later

@ibarwick it seems that the docker image don't have systemctl command in the image.
I also updated the version to 5.1 but still no luck

In that case I'm not sure what can be done. As stated before, we haven't tested this on Docker at all, so it's hard to see what the issue might be. If time permits I'll see if I can reproduce this later in the week, but can't promise anything.

@ibarwick thanks.. how do you start the repmgr btw?

this is how i do it

#!/bin/bash

repmgrd -v 

Aha, if you start it like that, it's probably not daemonizing properly.

Try something like:

repmgrd -f /etc/repmgr.conf --daemonize --pid-file=/tmp/repmgrd.pid >> /tmp/repmgrd.log 2>&1

@ibarwick do we have to stop the pg server whenever we are registering a node as primary or standby?

@ibarwick i think i have fixed it already
image

Thanks for your help.

Now I still have another task to do:

  1. is this line i think this is dirty.
repmgrd --verbose >> /tmp/repmgrd.log 2>&1
	tail -f /tmp/repmgrd.log

I have to tail on the log because docker container exists right away

  1. When i put down the 1st node. then put it back again it says still primary
    if you have an approach on that to make it standby instead since a new primary has already been elected already that would be such a great help for me.

Thanks,