Repmgr mark primary as unreachable but doesn't trigger failover

Question

Repmgr mark primary as unreachable but doesn't trigger failover

Opened this issue 3 years ago · 4 comments

Hey,

I have 3 cluster nodes deployed on VMs. There are some networks issues, which cause the primary to be not available to other nodes in cluster.

When we run cluster show command both other servers show primary as unreachable. But repmgr doesn't trigger failover. Further more, repmgr show no logs of monitoring the primary (also upstream node).

Repmgr work well in other situations, such as postgres service crush or server crush.

We have configured repmgr like that:

failover=automatic
reconnection_attempts=4
reconnect_interval=5

This seems to me like the relvant configuration for this problem.

Do you have any idea why repmgr doesnt trigger failover in situation like that? And doesnt write any logs either?

Answer 1 · 2021-11-21T14:23:59.000Z

Experiencing the same issue, help will be appreciated!

Answer 2 · 2021-11-22T19:46:16.000Z

Can you confirm that the repmgrd daemon is running on all nodes? The logs from at least one node should clearly show if/when disconnections occurred, if it actually disrupted the Postgres connections repmgrd makes to each node.

What does this command show:

repmgr service status

If it is running on all nodes, how are you checking the logs? The repmgr daemon is very chatty even under normal operating circumstances.

Answer 3 · 2021-12-20T06:36:16.000Z

@bonesmoses same here, with data corruption (different states on different servers).
I hope this information will help.

repmgr diagnostic output from all three nodes

# kk exec -it project***-prod-store-postgresql-ha-postgresql-0 -c postgresql -- bash -i
I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node status
postgresql-repmgr 05:35:06.75 
postgresql-repmgr 05:35:06.75 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:35:06.75 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:35:06.75 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:35:06.76 

WARNING: node "project***-prod-store-postgresql-ha-postgresql-1" not found in "pg_stat_replication"
Node "project***-prod-store-postgresql-ha-postgresql-0":
	PostgreSQL version: 11.13
	Total data size: 113 MB
	Conninfo: user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
	Role: primary
	WAL archiving: enabled
	Archive command: /bin/true
	WALs pending archiving: 0 pending files
	Replication connections: 1 (of maximal 16)
	Replication slots: 2 physical (of maximal 10; 0 missing); 1 inactive
	Replication lag: n/a

WARNING: following issue(s) were detected:
  - 1 of 2 downstream nodes not attached:
    - project***-prod-store-postgresql-ha-postgresql-1 (ID: 1001)

  - node has 1 inactive physical replication slots
    - repmgr_slot_1001
HINT: execute "repmgr node check" for more details
I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node check
postgresql-repmgr 05:35:09.38 
postgresql-repmgr 05:35:09.38 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:35:09.39 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:35:09.39 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:35:09.39 

WARNING: node "project***-prod-store-postgresql-ha-postgresql-1" not found in "pg_stat_replication"
Node "project***-prod-store-postgresql-ha-postgresql-0":
	Server role: OK (node is primary)
	Replication lag: OK (N/A - node is primary)
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: OK (N/A - node is primary)
	Downstream servers: CRITICAL (1 of 2 downstream nodes not attached; missing: project***-prod-store-postgresql-ha-postgresql-1 (ID: 1001))
	Replication slots: CRITICAL (1 of 2 physical replication slots are inactive)
	Missing physical replication slots: OK (node has no missing physical replication slots)
	Configured data directory: OK (configured "data_directory" is "/bitnami/postgresql/data")
I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show
postgresql-repmgr 05:35:15.13 
postgresql-repmgr 05:35:15.13 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:35:15.14 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:35:15.14 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:35:15.14 

 ID   | Name                                             | Role    | Status               | Upstream                                         | Location | Priority | Timeline | Connection string                                                                                                                                                                                                    
------+--------------------------------------------------+---------+----------------------+--------------------------------------------------+----------+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running            |                                                  | default  | 100      | 33       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary |                                                  | default  | 100      | 34       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby |   running            | project***-prod-store-postgresql-ha-postgresql-0 | default  | 100      | 33       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5

WARNING: following issues were detected
  - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary

I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster event
postgresql-repmgr 05:35:49.42 
postgresql-repmgr 05:35:49.43 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:35:49.43 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:35:49.44 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:35:49.44 

 Node ID | Name                                             | Event                    | OK | Timestamp           | Details                                                                                                                                                                                               
---------+--------------------------------------------------+--------------------------+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:50:45 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected                                                                                                               
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_start            | t  | 2021-12-13 09:50:27 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                              
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_shutdown         | t  | 2021-12-13 09:50:16 | TERM signal received                                                                                                                                                                                  
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_start            | t  | 2021-12-13 09:49:42 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                  
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:49:41 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected                                                                                                               
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_follow           | t  | 2021-12-13 09:49:39 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                       
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:49:35 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected                                                                                                               
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_register         | t  | 2021-12-13 09:49:32 | standby registration succeeded; upstream node ID is 1000 (-F/--force option was used)                                                                                                                 
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_clone            | t  | 2021-12-13 09:49:25 | cloned from host "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local", port 5432; backup method: pg_basebackup; --force: Y
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:48:53 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected                                                                                                               
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_failover_follow  | t  | 2021-12-13 09:48:48 | node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                      
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow           | t  | 2021-12-13 09:48:48 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                       
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_reload           | t  | 2021-12-13 09:48:47 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                              
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_failover_promote | t  | 2021-12-13 09:48:47 | node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) marked as failed                    
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | standby_promote          | t  | 2021-12-13 09:48:47 | server "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) was successfully promoted to primary                                                                                             
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | child_node_reconnect     | t  | 2021-12-13 09:48:23 | standby node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has reconnected after 73 seconds                                                                                           
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_start            | t  | 2021-12-13 09:48:21 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                  
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow           | t  | 2021-12-13 09:48:20 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                       
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_register         | t  | 2021-12-13 09:48:19 | standby registration succeeded; upstream node ID is 1001 (-F/--force option was used)                                                                                                                 
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_unregister       | t  | 2021-12-13 09:48:19 |                                                                                                                                                                                                       

I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf service status
postgresql-repmgr 05:36:19.96 
postgresql-repmgr 05:36:19.96 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:36:19.96 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:36:19.96 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:36:19.97 

 ID | Name                                             | Role    | Status               | Upstream                                         | repmgrd | PID | Paused? | Upstream last seen
----+--------------------------------------------------+---------+----------------------+--------------------------------------------------+---------+-----+---------+--------------------
 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running            |                                                  | running | 1   | no      | n/a                
 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary |                                                  | running | 1   | no      | n/a                
 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby |   running            | project***-prod-store-postgresql-ha-postgresql-0 | running | 1   | no      | 1 second(s) ago    

WARNING: following issues were detected
  - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary

I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ exit


# kk exec -it project***-prod-store-postgresql-ha-postgresql-1 -c postgresql -- bash -i
I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node status
postgresql-repmgr 05:36:47.54 
postgresql-repmgr 05:36:47.55 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:36:47.55 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:36:47.55 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:36:47.55 

Node "project***-prod-store-postgresql-ha-postgresql-1":
	PostgreSQL version: 11.13
	Total data size: 113 MB
	Conninfo: user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
	Role: primary
	WAL archiving: enabled
	Archive command: /bin/true
	WALs pending archiving: 0 pending files
	Replication connections: 0 (of maximal 16)
	Replication slots: 0 physical (of maximal 10; 0 missing)
	Replication lag: n/a

I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node check
postgresql-repmgr 05:36:50.03 
postgresql-repmgr 05:36:50.04 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:36:50.06 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:36:50.07 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:36:50.11 

Node "project***-prod-store-postgresql-ha-postgresql-1":
	Server role: OK (node is primary)
	Replication lag: OK (N/A - node is primary)
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: OK (N/A - node is primary)
	Downstream servers: OK (this node has no downstream nodes)
	Replication slots: OK (node has no physical replication slots)
	Missing physical replication slots: OK (node has no missing physical replication slots)
	Configured data directory: OK (configured "data_directory" is "/bitnami/postgresql/data")
I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show
postgresql-repmgr 05:36:53.63 
postgresql-repmgr 05:36:53.64 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:36:53.65 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:36:53.68 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:36:53.68 

 ID   | Name                                             | Role    | Status    | Upstream                                         | Location | Priority | Timeline | Connection string                                                                                                                                                                                                    
------+--------------------------------------------------+---------+-----------+--------------------------------------------------+----------+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | ! running |                                                  | default  | 100      | 33       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
 1001 | project***-prod-store-postgresql-ha-postgresql-1 | primary | * running |                                                  | default  | 100      | 34       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby |   running | project***-prod-store-postgresql-ha-postgresql-0 | default  | 100      | 33       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5

WARNING: following issues were detected
  - node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) is running but the repmgr node record is inactive

I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster event
postgresql-repmgr 05:36:55.57 
postgresql-repmgr 05:36:55.58 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:36:55.59 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:36:55.61 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:36:55.63 

    
 Node ID | Name                                             | Event                    | OK | Timestamp           | Details                                                                                                                                                                                               
---------+--------------------------------------------------+--------------------------+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_reload           | t  | 2021-12-13 09:50:15 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                              
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_failover_promote | t  | 2021-12-13 09:50:15 | node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) marked as failed                    
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_promote          | t  | 2021-12-13 09:50:14 | server "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) was successfully promoted to primary                                                                                             
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_start            | t  | 2021-12-13 09:49:42 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                  
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:49:41 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected                                                                                                               
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_follow           | t  | 2021-12-13 09:49:39 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                       
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:49:35 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected                                                                                                               
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_register         | t  | 2021-12-13 09:49:32 | standby registration succeeded; upstream node ID is 1000 (-F/--force option was used)                                                                                                                 
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_clone            | t  | 2021-12-13 09:49:25 | cloned from host "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local", port 5432; backup method: pg_basebackup; --force: Y
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:48:53 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected                                                                                                               
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_failover_follow  | t  | 2021-12-13 09:48:48 | node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                      
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow           | t  | 2021-12-13 09:48:48 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                       
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_reload           | t  | 2021-12-13 09:48:47 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                              
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_failover_promote | t  | 2021-12-13 09:48:47 | node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) marked as failed                    
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | standby_promote          | t  | 2021-12-13 09:48:47 | server "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) was successfully promoted to primary                                                                                             
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | child_node_reconnect     | t  | 2021-12-13 09:48:23 | standby node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has reconnected after 73 seconds                                                                                           
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_start            | t  | 2021-12-13 09:48:21 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                  
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow           | t  | 2021-12-13 09:48:20 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                       
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_register         | t  | 2021-12-13 09:48:19 | standby registration succeeded; upstream node ID is 1001 (-F/--force option was used)                                                                                                                 
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_unregister       | t  | 2021-12-13 09:48:19 |                                                                                                                                                                                                       

I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ 
I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf service status
postgresql-repmgr 05:37:04.17 
postgresql-repmgr 05:37:04.17 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:37:04.18 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:37:04.19 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:37:04.19 

 ID | Name                                             | Role    | Status    | Upstream                                         | repmgrd | PID | Paused? | Upstream last seen
----+--------------------------------------------------+---------+-----------+--------------------------------------------------+---------+-----+---------+--------------------
 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | ! running |                                                  | running | 1   | no      | n/a                
 1001 | project***-prod-store-postgresql-ha-postgresql-1 | primary | * running |                                                  | running | 1   | no      | n/a                
 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby |   running | project***-prod-store-postgresql-ha-postgresql-0 | running | 1   | no      | 1 second(s) ago    

WARNING: following issues were detected
  - node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) is running but the repmgr node record is inactive


# kk exec -it project***-prod-store-postgresql-ha-postgresql-2 -c postgresql -- bash -i
I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ 
I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node status
postgresql-repmgr 05:37:21.87 
postgresql-repmgr 05:37:21.87 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:37:21.88 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:37:21.88 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:37:21.88 

Node "project***-prod-store-postgresql-ha-postgresql-2":
	PostgreSQL version: 11.13
	Total data size: 113 MB
	Conninfo: user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
	Role: standby
	WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
	Archive command: /bin/true
	WALs pending archiving: 0 pending files
	Replication connections: 0 (of maximal 16)
	Replication slots: 0 physical (of maximal 10; 0 missing)
	Upstream node: project***-prod-store-postgresql-ha-postgresql-0 (ID: 1000)
	Replication lag: 0 seconds
	Last received LSN: 1/326922A8
	Last replayed LSN: 1/326922A8

I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node check
postgresql-repmgr 05:37:25.23 
postgresql-repmgr 05:37:25.23 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:37:25.24 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:37:25.24 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:37:25.24 

Node "project***-prod-store-postgresql-ha-postgresql-2":
	Server role: OK (node is standby)
	Replication lag: OK (0 seconds)
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: OK (node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) is attached to expected upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000))
	Downstream servers: OK (this node has no downstream nodes)
	Replication slots: OK (node has no physical replication slots)
	Missing physical replication slots: OK (node has no missing physical replication slots)
	Configured data directory: OK (configured "data_directory" is "/bitnami/postgresql/data")
I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show
postgresql-repmgr 05:37:31.11 
postgresql-repmgr 05:37:31.11 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:37:31.12 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:37:31.12 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:37:31.12 

 ID   | Name                                             | Role    | Status               | Upstream                                         | Location | Priority | Timeline | Connection string                                                                                                                                                                                                    
------+--------------------------------------------------+---------+----------------------+--------------------------------------------------+----------+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running            |                                                  | default  | 100      | 33       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary |                                                  | default  | 100      | 34       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5
 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby |   running            | project***-prod-store-postgresql-ha-postgresql-0 | default  | 100      | 33       | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5

WARNING: following issues were detected
  - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary

I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster event
postgresql-repmgr 05:37:34.82 
postgresql-repmgr 05:37:34.82 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:37:34.82 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:37:34.83 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:37:34.83 

 Node ID | Name                                             | Event                    | OK | Timestamp           | Details                                                                                                                                                                                               
---------+--------------------------------------------------+--------------------------+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:50:45 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected                                                                                                               
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_start            | t  | 2021-12-13 09:50:27 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                              
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_shutdown         | t  | 2021-12-13 09:50:16 | TERM signal received                                                                                                                                                                                  
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_start            | t  | 2021-12-13 09:49:42 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                  
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:49:41 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected                                                                                                               
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_follow           | t  | 2021-12-13 09:49:39 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                       
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:49:35 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected                                                                                                               
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_register         | t  | 2021-12-13 09:49:32 | standby registration succeeded; upstream node ID is 1000 (-F/--force option was used)                                                                                                                 
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | standby_clone            | t  | 2021-12-13 09:49:25 | cloned from host "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local", port 5432; backup method: pg_basebackup; --force: Y
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect   | t  | 2021-12-13 09:48:53 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected                                                                                                               
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_failover_follow  | t  | 2021-12-13 09:48:48 | node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                      
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow           | t  | 2021-12-13 09:48:48 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                       
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_reload           | t  | 2021-12-13 09:48:47 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)                                                                                                              
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_failover_promote | t  | 2021-12-13 09:48:47 | node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) marked as failed                    
 1000    | project***-prod-store-postgresql-ha-postgresql-0 | standby_promote          | t  | 2021-12-13 09:48:47 | server "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) was successfully promoted to primary                                                                                             
 1001    | project***-prod-store-postgresql-ha-postgresql-1 | child_node_reconnect     | t  | 2021-12-13 09:48:23 | standby node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has reconnected after 73 seconds                                                                                           
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_start            | t  | 2021-12-13 09:48:21 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                  
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow           | t  | 2021-12-13 09:48:20 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)                                                                                                       
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_register         | t  | 2021-12-13 09:48:19 | standby registration succeeded; upstream node ID is 1001 (-F/--force option was used)                                                                                                                 
 1002    | project***-prod-store-postgresql-ha-postgresql-2 | standby_unregister       | t  | 2021-12-13 09:48:19 |                                                                                                                                                                                                       

I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf service status
postgresql-repmgr 05:37:39.23 
postgresql-repmgr 05:37:39.23 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 05:37:39.24 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 05:37:39.24 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 05:37:39.24 

 ID | Name                                             | Role    | Status               | Upstream                                         | repmgrd | PID | Paused? | Upstream last seen
----+--------------------------------------------------+---------+----------------------+--------------------------------------------------+---------+-----+---------+--------------------
 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running            |                                                  | running | 1   | no      | n/a                
 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary |                                                  | running | 1   | no      | n/a                
 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby |   running            | project***-prod-store-postgresql-ha-postgresql-0 | running | 1   | no      | 1 second(s) ago    

WARNING: following issues were detected
  - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary

repmgr.conf from -2 node

event_notification_command='/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"'
ssh_options='-o "StrictHostKeyChecking no" -v'
use_replication_slots='1'
pg_bindir='/opt/bitnami/postgresql/bin'

# FIXME: these 2 parameter should work
node_id=1002
node_name='project***-prod-store-postgresql-ha-postgresql-2'
location='default'
conninfo='user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5'
failover='automatic'
promote_command='PGPASSWORD=anotherpassword*** repmgr standby promote -f "/opt/bitnami/repmgr/conf/repmgr.conf" --log-level DEBUG --verbose'
follow_command='PGPASSWORD=anotherpassword*** repmgr standby follow -f "/opt/bitnami/repmgr/conf/repmgr.conf" -W --log-level DEBUG --verbose'
reconnect_attempts='3'
reconnect_interval='5'
log_level='NOTICE'
priority='100'
degraded_monitoring_timeout='5'
data_directory='/bitnami/postgresql/data'
async_query_timeout='20'
pg_ctl_options='-o "--config-file=\"/opt/bitnami/postgresql/conf/postgresql.conf\" --external_pid_file=\"/opt/bitnami/postgresql/tmp/postgresql.pid\" --hba_file=\"/opt/bitnami/postgresql/conf/pg_hba.conf\""'

Logs from -0 node

postgresql-repmgr 09:50:23.11
postgresql-repmgr 09:50:23.13 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 09:50:23.14 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 09:50:23.15 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 09:50:23.16
postgresql-repmgr 09:50:23.29 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 09:50:23.39 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 09:50:23.41 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 09:50:23.42 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 09:50:25.52 WARN  ==> Conflict of pretending primary role nodes (previously: 'project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432', now: 'project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432')
postgresql-repmgr 09:50:25.52 INFO  ==> This node was acting as a primary before restart!
postgresql-repmgr 09:50:25.52 INFO  ==> Can not find new primary. Starting PostgreSQL normally...
postgresql-repmgr 09:50:25.53 INFO  ==> There are no nodes with primary role. Assuming the primary role...
postgresql-repmgr 09:50:25.54 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 09:50:25.56 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 09:50:25.75 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 09:50:25.77 INFO  ==> Initializing Repmgr...
postgresql-repmgr 09:50:25.78 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 09:50:25.78 INFO  ==> Cleaning stale /bitnami/postgresql/data/postmaster.pid file
postgresql-repmgr 09:50:25.79 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 09:50:25.80 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 09:50:25.84 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql-repmgr 09:50:25.88 INFO  ==> Configuring replication parameters
postgresql-repmgr 09:50:25.93 INFO  ==> Configuring fsync
postgresql-repmgr 09:50:25.95 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **

postgresql-repmgr 09:50:26.02 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2021-12-13 09:50:26.626 GMT [181] LOG:  pgaudit extension initialized
2021-12-13 09:50:26.627 GMT [181] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-12-13 09:50:26.627 GMT [181] LOG:  listening on IPv6 address "::", port 5432
2021-12-13 09:50:26.637 GMT [181] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-12-13 09:50:26.664 GMT [181] LOG:  redirecting log output to logging collector process
2021-12-13 09:50:26.664 GMT [181] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2021-12-13 09:50:26.671 GMT [183] LOG:  database system was interrupted; last known up at 2021-12-13 09:49:18 GMT
2021-12-13 09:50:26.844 GMT [183] LOG:  database system was not properly shut down; automatic recovery in progress
2021-12-13 09:50:26.852 GMT [183] LOG:  redo starts at 1/30000028
2021-12-13 09:50:26.882 GMT [183] LOG:  invalid record length at 1/31003D98: wanted 24, got 0
2021-12-13 09:50:26.882 GMT [183] LOG:  redo done at 1/31003D70
2021-12-13 09:50:26.882 GMT [183] LOG:  last completed transaction was at log time 2021-12-13 09:50:16.474731+00
2021-12-13 09:50:26.920 GMT [181] LOG:  database system is ready to accept connections
 done
server started
postgresql-repmgr 09:50:27.05 INFO  ==> ** Starting repmgrd **
[2021-12-13 09:50:27] [NOTICE] repmgrd (repmgrd 5.2.1) starting up
INFO:  set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid
[2021-12-13 09:50:27] [NOTICE] starting monitoring of node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
[2021-12-13 09:50:27] [NOTICE] monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
[2021-12-13 09:50:45] [NOTICE] new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected
2021-12-13 11:12:10.691 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:10.691 GMT [2773] DETAIL:  Key (user_hash)=(274c9456877e6474) already exists.
2021-12-13 11:12:10.691 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:10.670693+00:00'::timestamptz,'2021-12-13T11:12:10.670711+00:00'::timestamptz,'2021-12-13'::date,'274c9456877e6474','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:16.490 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:16.490 GMT [2773] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:16.490 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:16.489950+00:00'::timestamptz,'2021-12-13T11:12:16.489970+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:18.100 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:18.100 GMT [2773] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:18.100 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:18.099218+00:00'::timestamptz,'2021-12-13T11:12:18.099242+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:18.577 GMT [2771] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:18.577 GMT [2771] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:18.577 GMT [2771] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:18.576057+00:00'::timestamptz,'2021-12-13T11:12:18.576074+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:19.523 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:19.523 GMT [2773] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:19.523 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:19.495380+00:00'::timestamptz,'2021-12-13T11:12:19.495408+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:23.589 GMT [2771] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:23.589 GMT [2771] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:23.589 GMT [2771] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:23.582592+00:00'::timestamptz,'2021-12-13T11:12:23.582618+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:27.436 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:27.436 GMT [2773] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:27.436 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:27.434530+00:00'::timestamptz,'2021-12-13T11:12:27.434556+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:28.423 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:28.423 GMT [2773] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:28.423 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:28.420979+00:00'::timestamptz,'2021-12-13T11:12:28.420998+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:30.426 GMT [2771] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:30.426 GMT [2771] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:30.426 GMT [2771] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:30.424756+00:00'::timestamptz,'2021-12-13T11:12:30.424782+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 11:12:31.427 GMT [2773] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-13 11:12:31.427 GMT [2773] DETAIL:  Key (user_hash)=(f7fb2062aaacf611) already exists.
2021-12-13 11:12:31.427 GMT [2773] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:31.425182+00:00'::timestamptz,'2021-12-13T11:12:31.425208+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-13 15:04:26.569 GMT [2429] ERROR:  insert or update on table "apps" violates foreign key constraint "apps_icon_id_fkey"
2021-12-13 15:04:26.569 GMT [2429] DETAIL:  Key (icon_id)=(178) is not present in table "statics".
2021-12-13 15:04:26.569 GMT [2429] STATEMENT:  INSERT INTO "apps" ("created","updated","created_date","slug","store","title","link_by_click_install","icon_id") VALUES ('2021-12-13T15:04:26.321023+00:00'::timestamptz,'2021-12-13T15:04:26.321040+00:00'::timestamptz,'2021-12-13'::date,'test','App Store','test',NULL,178) RETURNING "id"
2021-12-15 03:46:31.041 GMT [415065] ERROR:  duplicate key value violates unique constraint "visitors_user_hash_key"
2021-12-15 03:46:31.041 GMT [415065] DETAIL:  Key (user_hash)=(47acabe37a05742a) already exists.
2021-12-15 03:46:31.041 GMT [415065] STATEMENT:  INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-15T03:46:30.615370+00:00'::timestamptz,'2021-12-15T03:46:30.615432+00:00'::timestamptz,'2021-12-14'::date,'47acabe37a05742a','35.191.10.180','Mozilla/5.0 (iPhone; CPU iPhone OS 15_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Instagram 216.0.0.12.135 (iPhone11,6; iOS 15_1; en_US; en-US; scale=3.00; 1242x2688; 338132253)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id"
2021-12-20 04:07:06.291 GMT [1881286] FATAL:  password authentication failed for user "admin"
2021-12-20 04:07:06.291 GMT [1881286] DETAIL:  Role "admin" does not exist.
        Connection matched pg_hba.conf line 8: "host     all              all       0.0.0.0/0    md5"
2021-12-20 04:07:07.412 GMT [1881296] FATAL:  password authentication failed for user "admin"
2021-12-20 04:07:07.412 GMT [1881296] DETAIL:  Role "admin" does not exist.
        Connection matched pg_hba.conf line 8: "host     all              all       0.0.0.0/0    md5"
2021-12-20 04:11:19.428 GMT [1882075] LOG:  unexpected EOF on client connection with an open transaction
2021-12-20 04:13:01.306 GMT [1882469] FATAL:  password authentication failed for user "postgres"
2021-12-20 04:13:01.306 GMT [1882469] DETAIL:  Password does not match for user "postgres".
        Connection matched pg_hba.conf line 8: "host     all              all       0.0.0.0/0    md5"
2021-12-20 04:19:27.311 GMT [1883873] LOG:  could not send data to client: Connection reset by peer
2021-12-20 04:19:27.311 GMT [1883873] STATEMENT:  COPY public."RawStatistic" (id, created, updated, created_date, action, device, os, os_version, browser, country, language, scroll, source, url, campaign_name, campaign_source, campaign_medium, campaign_term, campaign_content, session_id, visitor_id, screenshot_id, app_variant_id) TO stdout;
2021-12-20 04:19:27.311 GMT [1883873] FATAL:  connection to client lost
2021-12-20 04:19:27.311 GMT [1883873] STATEMENT:  COPY public."RawStatistic" (id, created, updated, created_date, action, device, os, os_version, browser, country, language, scroll, source, url, campaign_name, campaign_source, campaign_medium, campaign_term, campaign_content, session_id, visitor_id, screenshot_id, app_variant_id) TO stdout;
2021-12-20 04:19:55.702 GMT [1883987] ERROR:  canceling statement due to user request
2021-12-20 04:19:55.702 GMT [1883987] STATEMENT:  COPY public."RawStatistic" (id, created, updated, created_date, action, device, os, os_version, browser, country, language, scroll, source, url, campaign_name, campaign_source, campaign_medium, campaign_term, campaign_content, session_id, visitor_id, screenshot_id, app_variant_id) TO stdout;
2021-12-20 04:19:55.704 GMT [1883987] LOG:  could not receive data from client: Connection reset by peer
2021-12-20 05:09:08.244 GMT [1893736] LOG:  invalid length of startup packet

Logs from -1 node

postgresql-repmgr 09:49:12.69
postgresql-repmgr 09:49:12.77 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 09:49:12.78 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 09:49:12.78 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 09:49:12.78
postgresql-repmgr 09:49:14.52 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 09:49:15.25 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 09:49:15.26 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 09:49:15.27 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 09:49:16.35 INFO  ==> Auto-detected primary node: 'project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432'
postgresql-repmgr 09:49:16.42 INFO  ==> This node was acting as a primary before restart!
postgresql-repmgr 09:49:16.43 INFO  ==> Current master is 'project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432'. Cloning/rewinding it and acting as a standby node...
postgresql-repmgr 09:49:16.65 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 09:49:16.78 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 09:49:17.50 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 09:49:17.55 INFO  ==> Initializing Repmgr...
postgresql-repmgr 09:49:17.59 INFO  ==> Waiting for primary node...
postgresql-repmgr 09:49:17.64 INFO  ==> Cloning data from primary node...
postgresql-repmgr 09:49:27.52 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 09:49:27.76 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 09:49:27.76 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 09:49:28.08 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql-repmgr 09:49:28.20 INFO  ==> Configuring replication parameters
postgresql-repmgr 09:49:28.37 INFO  ==> Configuring fsync
postgresql-repmgr 09:49:28.44 INFO  ==> Setting up streaming replication slave...
postgresql-repmgr 09:49:28.63 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 09:49:31.88 INFO  ==> Unregistering standby node...
postgresql-repmgr 09:49:32.25 INFO  ==> Registering Standby node...
postgresql-repmgr 09:49:32.46 INFO  ==> Check if primary running...
postgresql-repmgr 09:49:32.50 INFO  ==> Waiting for primary node...
postgresql-repmgr 09:49:32.99 INFO  ==> Running standby follow...
postgresql-repmgr 09:49:39.51 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 09:49:39.81 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **

postgresql-repmgr 09:49:39.92 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2021-12-13 09:49:40.209 GMT [288] LOG:  pgaudit extension initialized
2021-12-13 09:49:40.210 GMT [288] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-12-13 09:49:40.210 GMT [288] LOG:  listening on IPv6 address "::", port 5432
2021-12-13 09:49:40.316 GMT [288] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-12-13 09:49:40.451 GMT [288] LOG:  redirecting log output to logging collector process
2021-12-13 09:49:40.451 GMT [288] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2021-12-13 09:49:40.523 GMT [290] LOG:  database system was shut down in recovery at 2021-12-13 09:49:39 GMT
2021-12-13 09:49:40.523 GMT [290] LOG:  entering standby mode
2021-12-13 09:49:40.574 GMT [290] LOG:  redo starts at 1/30000028
2021-12-13 09:49:40.578 GMT [290] LOG:  consistent recovery state reached at 1/31003B00
2021-12-13 09:49:40.578 GMT [290] LOG:  invalid record length at 1/31003B00: wanted 24, got 0
2021-12-13 09:49:40.578 GMT [288] LOG:  database system is ready to accept read only connections
 done
server started
2021-12-13 09:49:40.630 GMT [295] LOG:  started streaming WAL from primary at 1/31000000 on timeline 33
postgresql-repmgr 09:49:40.63 INFO  ==> ** Starting repmgrd **
[2021-12-13 09:49:40] [NOTICE] repmgrd (repmgrd 5.2.1) starting up
INFO:  set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid
[2021-12-13 09:49:42] [NOTICE] starting monitoring of node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)
[2021-12-13 09:49:51] [WARNING] unable to ping "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5"
[2021-12-13 09:49:51] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:49:51] [WARNING] unable to connect to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
[2021-12-13 09:49:56] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr"
[2021-12-13 09:49:56] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:50:01] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr"
[2021-12-13 09:50:01] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:50:06] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr"
[2021-12-13 09:50:06] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:50:06] [WARNING] unable to reconnect to node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) after 3 attempts
[2021-12-13 09:50:06] [NOTICE] promotion candidate is "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)
[2021-12-13 09:50:06] [NOTICE] this node is the winner, will now promote itself and inform other nodes
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
INFO: connected to standby, checking its state
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1001
DEBUG: get_replication_info():
 SELECT ts,         in_recovery,         last_wal_receive_lsn,         last_wal_replay_lsn,         last_xact_replay_timestamp,         CASE WHEN (last_wal_receive_lsn = last_wal_replay_lsn)           THEN 0::INT         ELSE           CASE WHEN last_xact_replay_timestamp IS NULL             THEN 0::INT           ELSE             EXTRACT(epoch FROM (pg_catalog.clock_timestamp() - last_xact_replay_timestamp))::INT           END         END AS replication_lag_time,         last_wal_receive_lsn >= last_wal_replay_lsn AS receiving_streamed_wal,         wal_replay_paused,         upstream_last_seen,         upstream_node_id    FROM (  SELECT CURRENT_TIMESTAMP AS ts,         pg_catalog.pg_is_in_recovery() AS in_recovery,         pg_catalog.pg_last_xact_replay_timestamp() AS last_xact_replay_timestamp,         COALESCE(pg_catalog.pg_last_wal_receive_lsn(), '0/0'::PG_LSN) AS last_wal_receive_lsn,         COALESCE(pg_catalog.pg_last_wal_replay_lsn(),  '0/0'::PG_LSN) AS last_wal_replay_lsn,         CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE           THEN FALSE           ELSE pg_catalog.pg_is_wal_replay_paused()         END AS wal_replay_paused,         CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE           THEN -1           ELSE repmgr.get_upstream_last_seen()         END AS upstream_last_seen,         CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE           THEN -1           ELSE repmgr.get_upstream_node_id()         END AS upstream_node_id           ) q
INFO: searching for primary node
DEBUG: get_primary_connection():
  SELECT node_id, conninfo,          CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority         FROM repmgr.nodes    WHERE active IS TRUE      AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
INFO: checking if node 1000 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL:
could not translate host name "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known

DETAIL: attempted to connect using:
  user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=
INFO: checking if node 1001 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
INFO: checking if node 1002 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_replication_stats():
 SELECT pg_catalog.current_setting('max_wal_senders')::INT AS max_wal_senders,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS attached_wal_receivers,         current_setting('max_replication_slots')::INT AS max_replication_slots,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE slot_type='physical') AS total_replication_slots,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE active IS TRUE AND slot_type='physical')  AS active_replication_slots,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE active IS FALSE AND slot_type='physical') AS inactive_replication_slots,         pg_catalog.pg_is_in_recovery() AS in_recovery
DEBUG: get_active_sibling_node_records():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached     FROM repmgr.nodes n    WHERE n.upstream_node_id = 1000      AND n.node_id != 1001      AND n.active IS TRUE ORDER BY n.node_id
DEBUG: clear_node_info_list() - closing open connections
DEBUG: clear_node_info_list() - unlinking
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  project***-prod-store-postgresql-ha-postgresql-2 (node ID: 1002)
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1001
NOTICE: promoting standby to primary
DETAIL: promoting server "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) using "/opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -w -D '/bitnami/postgresql/data' promote"
2021-12-13 09:50:07.879 GMT [290] LOG:  received promote request
2021-12-13 09:50:07.880 GMT [295] FATAL:  terminating walreceiver process due to administrator command
2021-12-13 09:50:07.881 GMT [290] LOG:  invalid record length at 1/31003CC8: wanted 24, got 0
2021-12-13 09:50:07.881 GMT [290] LOG:  redo done at 1/31003CA0
2021-12-13 09:50:07.881 GMT [290] LOG:  last completed transaction was at log time 2021-12-13 09:49:42.657704+00
2021-12-13 09:50:11.833 GMT [290] LOG:  selected new timeline ID: 34
2021-12-13 09:50:13.598 GMT [290] LOG:  archive recovery complete
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
INFO: standby promoted to primary after 0 second(s)
DEBUG: setting node 1001 as primary and marking existing primary as failed
DEBUG: begin_transaction()
2021-12-13 09:50:14.606 GMT [288] LOG:  database system is ready to accept connections
DEBUG: commit_transaction()
NOTICE: STANDBY PROMOTE successful
DETAIL: server "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) was successfully promoted to primary
DEBUG: _create_event(): event is "standby_promote" for node 1001
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: _create_event():
   INSERT INTO repmgr.events (              node_id,              event,              successful,              details             )       VALUES ($1, $2, $3, $4)    RETURNING event_timestamp
DEBUG: _create_event(): Event timestamp is "2021-12-13 09:50:14.76431+00"
DEBUG: _create_event(): command is '/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"'
INFO: executing notification command for event "standby_promote"
DETAIL: command is:
  /opt/bitnami/repmgr/events/router.sh 1001 standby_promote 1 "2021-12-13 09:50:14.76431+00" "server \"project***-prod-store-postgresql-ha-postgresql-1\" (ID: 1001) was successfully promoted to primary"
DEBUG: clear_node_info_list() - closing open connections
DEBUG: clear_node_info_list() - unlinking
[2021-12-13 09:50:15] [NOTICE] node 1001 has recovered, reconnecting
[2021-12-13 09:50:15] [NOTICE] notifying node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) to follow node 1001
INFO:  node 1002 received notification to follow node 1001
[2021-12-13 09:50:15] [NOTICE] monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)
2021-12-13 11:12:06.332 GMT [2907] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_session_id_fkey"
2021-12-13 11:12:06.332 GMT [2907] DETAIL:  Key (session_id)=(12855) is not present in table "sessions".
2021-12-13 11:12:06.332 GMT [2907] STATEMENT:  INSERT INTO "RawStatistic" ("created","updated","created_date","action","device","os","os_version","browser","country","language","scroll","source","url","campaign_name","campaign_source","campaign_medium","campaign_term","campaign_content","app_variant_id","screenshot_id","session_id","visitor_id") VALUES ('2021-12-13T11:12:05.897118+00:00'::timestamptz,'2021-12-13T11:12:05.897142+00:00'::timestamptz,'2021-12-13'::date,8,'Unknown','Unknown','Unknown','Unknown','US','en-US',0.0,'direct','https://store.tld?args=***',NULL,NULL,NULL,NULL,NULL,75,NULL,12855,8234) RETURNING "id"
2021-12-13 11:12:06.731 GMT [2907] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_session_id_fkey"
2021-12-13 11:12:06.731 GMT [2907] DETAIL:  Key (session_id)=(12855) is not present in table "sessions".
2021-12-13 11:12:06.731 GMT [2907] STATEMENT:  INSERT INTO "RawStatistic" ("created","updated","created_date","action","device","os","os_version","browser","country","language","scroll","source","url","campaign_name","campaign_source","campaign_medium","campaign_term","campaign_content","app_variant_id","screenshot_id","session_id","visitor_id") VALUES ('2021-12-13T11:12:06.729298+00:00'::timestamptz,'2021-12-13T11:12:06.729325+00:00'::timestamptz,'2021-12-13'::date,2,'Unknown','Unknown','Unknown','Unknown','US','en-US',0.0,'direct','https://store.tld?args=***',NULL,NULL,NULL,NULL,NULL,75,NULL,12855,8234) RETURNING "id"
2021-12-13 16:29:58.138 GMT [64721] LOG:  incomplete startup packet
2021-12-14 11:09:35.701 GMT [2608] LOG:  could not receive data from client: Connection reset by peer
2021-12-14 11:58:58.867 GMT [249004] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey"
2021-12-14 11:58:58.867 GMT [249004] DETAIL:  Key (visitor_id)=(8249) is not present in table "visitors".
2021-12-14 11:58:58.867 GMT [249004] STATEMENT:  UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:58:58.861377+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311
2021-12-14 11:59:19.303 GMT [249004] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey"
2021-12-14 11:59:19.303 GMT [249004] DETAIL:  Key (visitor_id)=(8249) is not present in table "visitors".
2021-12-14 11:59:19.303 GMT [249004] STATEMENT:  UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:19.296233+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311
2021-12-14 11:59:27.267 GMT [249004] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey"
2021-12-14 11:59:27.267 GMT [249004] DETAIL:  Key (visitor_id)=(8249) is not present in table "visitors".
2021-12-14 11:59:27.267 GMT [249004] STATEMENT:  UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:27.248728+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311
2021-12-14 11:59:32.276 GMT [249004] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey"
2021-12-14 11:59:32.276 GMT [249004] DETAIL:  Key (visitor_id)=(8249) is not present in table "visitors".
2021-12-14 11:59:32.276 GMT [249004] STATEMENT:  UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:32.254412+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311
2021-12-14 11:59:37.217 GMT [249004] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey"
2021-12-14 11:59:37.217 GMT [249004] DETAIL:  Key (visitor_id)=(8249) is not present in table "visitors".
2021-12-14 11:59:37.217 GMT [249004] STATEMENT:  UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:37.211488+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311
2021-12-14 11:59:44.253 GMT [249004] ERROR:  insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey"

Logs from -2 node

postgresql-repmgr 09:47:44.57
postgresql-repmgr 09:47:44.57 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 09:47:44.57 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr
postgresql-repmgr 09:47:44.58 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues
postgresql-repmgr 09:47:44.58
postgresql-repmgr 09:47:44.69 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 09:47:44.73 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 09:47:44.74 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 09:47:44.74 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 09:47:44.91 INFO  ==> Auto-detected primary node: 'project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432'
postgresql-repmgr 09:47:44.93 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 09:47:44.95 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 09:47:45.23 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 09:47:45.26 INFO  ==> Initializing Repmgr...
postgresql-repmgr 09:47:45.27 INFO  ==> Waiting for primary node...
postgresql-repmgr 09:47:45.33 INFO  ==> Cloning data from primary node...
postgresql-repmgr 09:48:17.84 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 09:48:17.89 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 09:48:17.89 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 09:48:18.03 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql-repmgr 09:48:18.07 INFO  ==> Configuring replication parameters
postgresql-repmgr 09:48:18.15 INFO  ==> Configuring fsync
postgresql-repmgr 09:48:18.18 INFO  ==> Setting up streaming replication slave...
postgresql-repmgr 09:48:18.27 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 09:48:19.33 INFO  ==> Unregistering standby node...
postgresql-repmgr 09:48:19.52 INFO  ==> Registering Standby node...
postgresql-repmgr 09:48:19.73 INFO  ==> Check if primary running...
postgresql-repmgr 09:48:19.74 INFO  ==> Waiting for primary node...
postgresql-repmgr 09:48:19.78 INFO  ==> Running standby follow...
postgresql-repmgr 09:48:20.39 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped

postgresql-repmgr 09:48:20.53 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **
postgresql-repmgr 09:48:20.61 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2021-12-13 09:48:20.666 GMT [291] LOG:  pgaudit extension initialized
2021-12-13 09:48:20.667 GMT [291] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-12-13 09:48:20.667 GMT [291] LOG:  listening on IPv6 address "::", port 5432
2021-12-13 09:48:20.678 GMT [291] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-12-13 09:48:20.694 GMT [291] LOG:  redirecting log output to logging collector process
2021-12-13 09:48:20.694 GMT [291] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2021-12-13 09:48:20.702 GMT [293] LOG:  database system was shut down in recovery at 2021-12-13 09:48:20 GMT
2021-12-13 09:48:20.703 GMT [293] LOG:  entering standby mode
2021-12-13 09:48:20.709 GMT [293] LOG:  redo starts at 1/2E000028
2021-12-13 09:48:20.710 GMT [293] LOG:  consistent recovery state reached at 1/2F002DE8
2021-12-13 09:48:20.710 GMT [293] LOG:  invalid record length at 1/2F002DE8: wanted 24, got 0
2021-12-13 09:48:20.711 GMT [291] LOG:  database system is ready to accept read only connections
2021-12-13 09:48:20.727 GMT [297] LOG:  started streaming WAL from primary at 1/2F000000 on timeline 32
 done
server started
postgresql-repmgr 09:48:20.76 INFO  ==> ** Starting repmgrd **
[2021-12-13 09:48:20] [NOTICE] repmgrd (repmgrd 5.2.1) starting up
INFO:  set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid
[2021-12-13 09:48:21] [NOTICE] starting monitoring of node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002)
[2021-12-13 09:48:32] [WARNING] unable to ping "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5"
[2021-12-13 09:48:32] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:48:32] [WARNING] unable to connect to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001)
[2021-12-13 09:48:37] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr"
[2021-12-13 09:48:37] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:48:42] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr"
[2021-12-13 09:48:42] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:48:47] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr"
[2021-12-13 09:48:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:48:47] [WARNING] unable to reconnect to node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) after 3 attempts
[2021-12-13 09:48:47] [WARNING] node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) is not in recovery
[2021-12-13 09:48:47] [ERROR] connection to database failed
[2021-12-13 09:48:47] [DETAIL]
fe_sendauth: no password supplied

[2021-12-13 09:48:47] [ERROR] unable to establish a replication connection to the local node
[2021-12-13 09:48:47] [WARNING] not possible to attach to node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000), ignoring
[2021-12-13 09:48:47] [NOTICE] promotion candidate is "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002)
[2021-12-13 09:48:47] [NOTICE] this node is the winner, will now promote itself and inform other nodes
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
INFO: connected to standby, checking its state
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1002
DEBUG: get_replication_info():
 SELECT ts,         in_recovery,         last_wal_receive_lsn,         last_wal_replay_lsn,         last_xact_replay_timestamp,         CASE WHEN (last_wal_receive_lsn = last_wal_replay_lsn)           THEN 0::INT         ELSE           CASE WHEN last_xact_replay_timestamp IS NULL             THEN 0::INT           ELSE             EXTRACT(epoch FROM (pg_catalog.clock_timestamp() - last_xact_replay_timestamp))::INT           END         END AS replication_lag_time,         last_wal_receive_lsn >= last_wal_replay_lsn AS receiving_streamed_wal,         wal_replay_paused,         upstream_last_seen,         upstream_node_id    FROM (  SELECT CURRENT_TIMESTAMP AS ts,         pg_catalog.pg_is_in_recovery() AS in_recovery,         pg_catalog.pg_last_xact_replay_timestamp() AS last_xact_replay_timestamp,         COALESCE(pg_catalog.pg_last_wal_receive_lsn(), '0/0'::PG_LSN) AS last_wal_receive_lsn,         COALESCE(pg_catalog.pg_last_wal_replay_lsn(),  '0/0'::PG_LSN) AS last_wal_replay_lsn,         CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE           THEN FALSE           ELSE pg_catalog.pg_is_wal_replay_paused()         END AS wal_replay_paused,         CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE           THEN -1           ELSE repmgr.get_upstream_last_seen()         END AS upstream_last_seen,         CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE           THEN -1           ELSE repmgr.get_upstream_node_id()         END AS upstream_node_id           ) q
INFO: searching for primary node
DEBUG: get_primary_connection():
  SELECT node_id, conninfo,          CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority         FROM repmgr.nodes    WHERE active IS TRUE      AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
INFO: checking if node 1001 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL:
could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known

DETAIL: attempted to connect using:
  user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=
INFO: checking if node 1000 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
INFO: current primary node is 1000
ERROR: this replication cluster already has an active primary server
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1000
DETAIL: current primary is "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
[2021-12-13 09:48:47] [ERROR] promote command failed
[2021-12-13 09:48:47] [DETAIL] promote command exited with error code 8
[2021-12-13 09:48:47] [ERROR] connection to database failed
[2021-12-13 09:48:47] [DETAIL]
could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known

[2021-12-13 09:48:47] [DETAIL] attempted to connect using:
  user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=
[2021-12-13 09:48:47] [WARNING] unable to ping "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5"
[2021-12-13 09:48:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-12-13 09:48:47] [NOTICE] attempting to follow new primary "project***-prod-store-postgresql-ha-postgresql-0" (node ID: 1000)
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
WARNING: following problems with command line parameters detected:
  --no-wait will be ignored when executing STANDBY FOLLOW
DEBUG: do_standby_follow()
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
INFO: connected to local node
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1002
NOTICE: attempting to find and follow current primary
INFO: searching for primary node
DEBUG: get_primary_connection():
  SELECT node_id, conninfo,          CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority         FROM repmgr.nodes    WHERE active IS TRUE      AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
INFO: checking if node 1001 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL:
could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known

DETAIL: attempted to connect using:
  user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=
INFO: checking if node 1000 is primary
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
INFO: current primary node is 1000
INFO: connected to node 1000, checking for current primary
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1000
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
INFO: follow target is primary node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
DEBUG: local timeline: 32; follow target timeline: 33
DEBUG: get_timeline_history():
TIMELINE_HISTORY 33
DEBUG: local tli: 32; local_xlogpos: 1/2F002FC8; follow_target_history->tli: 32; follow_target_history->end: 1/2F002FC8
INFO: local node 1002 can attach to follow target node 1000
DETAIL: local node's recovery point: 1/2F002FC8; follow target node's fork point: 1/2F002FC8
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1002
INFO: creating replication slot as user "repmgr"
DEBUG: get_slot_record():
SELECT slot_name, slot_type, active   FROM pg_catalog.pg_replication_slots  WHERE slot_name = 'repmgr_slot_1002'
DEBUG: create_replication_slot_sql(): creating slot "repmgr_slot_1002" on upstream
DEBUG: create_replication_slot_sql():
SELECT * FROM pg_catalog.pg_create_physical_replication_slot('repmgr_slot_1002', TRUE)
DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1001
NOTICE: setting node 1002's upstream to node 1000
DEBUG: create_recovery_file(): creating "/bitnami/postgresql/data/recovery.conf"...
DEBUG: recovery.conf line: standby_mode = 'on'

DEBUG: recovery.conf line: primary_conninfo = 'user=repmgr password=anotherpassword*** connect_timeout=5 host=''project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local'' port=5432 application_name=''project***-prod-store-postgresql-ha-postgresql-2'''

DEBUG: recovery.conf line: recovery_target_timeline = 'latest'

DEBUG: recovery.conf line: primary_slot_name = 'repmgr_slot_1002'

DEBUG: is_server_available(): ping status for "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5" is PQPING_OK
NOTICE: stopping server using "/opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -D '/bitnami/postgresql/data' -w -m fast stop"
DEBUG: executing:
  /opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -D '/bitnami/postgresql/data' -w -m fast stop 2>/tmp/repmgr_command.5YBIAb
2021-12-13 09:48:48.029 GMT [291] LOG:  received fast shutdown request
2021-12-13 09:48:48.032 GMT [291] LOG:  aborting any active transactions
2021-12-13 09:48:48.033 GMT [297] FATAL:  terminating walreceiver process due to administrator command
2021-12-13 09:48:48.035 GMT [294] LOG:  shutting down
2021-12-13 09:48:48.050 GMT [291] LOG:  database system is shut down
DEBUG: result of command was 141 (36096)
DEBUG: local_command(): output returned was:
waiting for server to shut down.... done

NOTICE: starting server using "/opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -w -D '/bitnami/postgresql/data' start"
DEBUG: executing:
  /opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -w -D '/bitnami/postgresql/data' start 2>/tmp/repmgr_command.aIkDad
DEBUG: result of command was 141 (36096)
DEBUG: local_command(): output returned was:
waiting for server to shut down.... done
waiting for server to start....2021-12-13 09:48:48.176 GMT [376] LOG:  pgaudit extension initialized

DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL:
could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known

DETAIL: attempted to connect using:
  user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=
WARNING: unable to connect to old upstream node 1001 to remove replication slot
HINT: if reusing this node, you should manually remove any inactive replication slots
DEBUG: update_node_record_status():
    UPDATE repmgr.nodes      SET type = 'standby',          upstream_node_id = 1000,          active = TRUE    WHERE node_id = 1002
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
DEBUG: _create_event(): event is "standby_follow" for node 1002
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: _create_event():
   INSERT INTO repmgr.events (              node_id,              event,              successful,              details             )       VALUES ($1, $2, $3, $4)    RETURNING event_timestamp
DEBUG: _create_event(): Event timestamp is "2021-12-13 09:48:48.268433+00"
DEBUG: _create_event(): command is '/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"'
INFO: executing notification command for event "standby_follow"
DETAIL: command is:
  /opt/bitnami/repmgr/events/router.sh 1002 standby_follow 1 "2021-12-13 09:48:48.268433+00" "standby attached to upstream node \"project***-prod-store-postgresql-ha-postgresql-0\" (ID: 1000)"
INFO:  set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid
[2021-12-13 09:48:48] [NOTICE] node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)
postgresql-repmgr 09:48:48.44 DEBUG ==> Executing SQL command:
SELECT upstream_node_id FROM repmgr.nodes WHERE node_id=1002
[2021-12-13 10:04:38] [ERROR] unable to determine if server is in recovery
[2021-12-13 10:04:38] [DETAIL]
could not receive data from server: Connection timed out

[2021-12-13 10:04:38] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2021-12-13 10:04:38] [NOTICE] local node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002)'s upstream appears to have changed, restarting monitoring
[2021-12-13 10:04:38] [DETAIL] currently monitoring upstream 1001; new upstream is 1000

It looks like unexpected name lookup failure during repmgr startup may be the root cause of cluster being stuck in split-brain state and if it's true then all we need is to add DNS lookup retries here and there...

Answer 4 · 2022-02-28T06:14:45.000Z

I'm having the same issue. In my case I was able to narrow it down to certain VM network behaviors - for example, running under Openstack as my vm provider, in cases where a VM's private network is disconnected this state happens. This private network failure causes the VM to be unable to access its own block storage.
It seems that however repmgr is checking connectivity for cluster status check (the one that shows primary as unreachable) is not the same method as that which causes failover to occur, since I would expect that if not having storage is enough to report unreachable (and it is!) then failover should definitely occur here.