/ansible-postgresql-repmgr-ha

Ansible role to configure PostgreSQL, Repmgr for High Availability and automated load balancing with HAProxy, Corosync and Pacemaker

Primary LanguageHTMLMIT LicenseMIT

PostgreSQL and Repmgr for High Availability and automated load balancing with HAProxy, Corosync and Pacemaker


Build Status Ansible Galaxy Ansible Quality Score MIT License

Introduction

Ansible role to configure PostgreSQL, Repmgr for High Availability and automated load balancing with HAProxy, Corosync and Pacemaker

Automated replication and failover

The base functionality offer by the role is to configure the PostgreSQL cluster with replication monitored by Repmgr. You will need to have, ideally, 3 nodes in order to satisfy the qorum size.

Each node should have the postgres_role set to master (exactly 1) and slave (2 or more)

  host_vars:
    node1:
      postgres_role: master
    node2:
      postgres_role: slave
    node3:
      postgres_role: slave

The role will work out of the box, with standard master-slave replication, with no configuration needed. Check defaults/main.yml for optional config vars.

To check the replication status , login as postgres and run:

postgres@node1:~$ psql -Aqtc "SELECT count(*) FROM pg_stat_replication"
2

Managed Replication using Repmgr

repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations.

In order to turn on the installation and configuration using repmgr, set postgres_repmgr_enabled group variable to true:

group_vars:
  postgres_repmgr_enabled: true

This will be using repmgr instead of pg_basebackup to clone the master and than register it in the repmgr cluster.

Once configured, you can check the replication status via repmgr as follows:

postgres@node1:~$ repmgr -f /etc/repmgr.conf cluster crosscheck
INFO: connecting to database
 Name  | ID | 1 | 2 | 3
-------------+----+---+---+---
 node1 | 1  | * | * | * 
 node2 | 2  | * | * | * 
 node3 | 3  | * | * | * 


postgres@node1:~$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name        | Role    | Status    | Upstream    | Location | Priority | Timeline | Connection string                                          
----+-------------+---------+-----------+-------------+----------+----------+----------+-------------------------------------------------------------
 1  | node1 | primary | * running |       | default  | 100      | 1        | host=172.17.0.3 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1 | default  | 100      | 1        | host=172.17.0.4 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running | node1 | default  | 100      | 1        | host=172.17.0.5 dbname=repmgr user=repmgr connect_timeout=2

Load Balancing and Automated Failover

Repmgr will ensure that the replication remains functional by choosing a new master (primary) and adjusting the replication slave (master) targets. The clients won't know however, which is the master and which is the server unless they do some discovery (ex: check replication status).

The approach found in this role takes advantage of the Corosync/Pacemaker to manage iptables rules, by automatically blocking writes on the slaves Repmgr Sentinel triggers failover events.

  • each node will listen on 3 ports 5432 (default), 5434(RO), 5435 (RW), last 2 via iptables prerouting
  • the master will have postgres_role=master corosync note attribute and postgres_role=slave for slaves respectivelly
  • corosync will block port 5435 when redis_role=replica
  • corosync will block port 5435 and 5434 when redis_role=fail
  • the load balancer will try 5435 port for RW and only one server will NOT reject the connections postgres_role=master

Requirements

The nodes should have Corosync with Pacemaker already configured. Check my corosync_pacemaker ansible role . Having symmetric-cluster is also required so that not all nodes will get the resources assigned to them, but only based on pacemaker colocation rules.

playbook.yml:

- name: Prepare Corosync/Pacemaker
  hosts: all
  gather_facts: true
  roles:
    - mariancraciun1983.corosync_pacemaker
    - mariancraciun1983.postgres_repmgr_ha

group_vars/all.yml

# corosync config
install_python3: true
corosync_hacluster_password: 1q2w3e4r5t
corosync_cluster_settings:
  - key: stonith-enabled
    value: "false"
  - key: no-quorum-policy
    value: ignore
  - key: start-failure-is-fatal
    value: "false"
  - key: symmetric-cluster
    value: "false"

# enable repmgr
postgres_repmgr_enabled: true
# enable repmgr integration with pacemaker
postgres_repmgr_pacemaker: true
# this must be true only once, initially, when the Corosync node attributes need to be configured
# after that, repmgr will be triggering node attributes updates in case of a failover
postgres_repmgr_pacemaker_helpers_init: true

Running crm_mon -AnfroRtc will gives us the following:

Cluster Summary:
  * Stack: corosync
  * Current DC: repmgrpmk2 (2) (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Mon Nov 16 05:53:53 2020
  * Last change:  Mon Nov 16 05:53:06 2020 by postgres via crm_attribute on repmgrpmk2
  * 3 nodes configured
  * 6 resource instances configured

Node List:
  * Node repmgrpmk1 (1): online:
    * Resources:
  * Node repmgrpmk2 (2): online:
    * Resources:
      * PostgresqlLBWriteBlock  (ocf::heartbeat:command_raw):    Started
  * Node repmgrpmk3 (3): online:
    * Resources:
      * PostgresqlLBWriteBlock  (ocf::heartbeat:command_raw):    Started

Inactive Resources:
  * Clone Set: PostgresqlLBReadBlock-clone [PostgresqlLBReadBlock]:
    * PostgresqlLBReadBlock     (ocf::heartbeat:command_raw):    Stopped
    * PostgresqlLBReadBlock     (ocf::heartbeat:command_raw):    Stopped
    * PostgresqlLBReadBlock     (ocf::heartbeat:command_raw):    Stopped
    * Stopped: [ repmgrpmk1 repmgrpmk2 repmgrpmk3 ]
  * Clone Set: PostgresqlLBWriteBlock-clone [PostgresqlLBWriteBlock]:
    * PostgresqlLBWriteBlock    (ocf::heartbeat:command_raw):    Started repmgrpmk2
    * PostgresqlLBWriteBlock    (ocf::heartbeat:command_raw):    Started repmgrpmk3
    * PostgresqlLBWriteBlock    (ocf::heartbeat:command_raw):    Stopped
    * Started: [ repmgrpmk2 repmgrpmk3 ]
    * Stopped: [ repmgrpmk1 ]

Node Attributes:
  * Node: repmgrpmk1 (1):
    * postgresql_role                   : primary   
  * Node: repmgrpmk2 (2):
    * postgresql_role                   : replica   
  * Node: repmgrpmk3 (3):
    * postgresql_role                   : replica   

Operations:
  * Node: repmgrpmk2 (2):
    * PostgresqlLBWriteBlock: migration-threshold=1000000:
      * (12) start: last-rc-change="Mon Nov 16 05:53:06 2020" last-run="Mon Nov 16 05:53:06 2020" exec-time="21ms" queue-time="0ms" rc=0 (ok)
      * (13) monitor: interval="10000ms" last-rc-change="Mon Nov 16 05:53:06 2020" exec-time="13ms" queue-time="0ms" rc=0 (ok)
  * Node: repmgrpmk1 (1):
  * Node: repmgrpmk3 (3):
    * PostgresqlLBWriteBlock: migration-threshold=1000000:
      * (12) start: last-rc-change="Mon Nov 16 05:53:06 2020" last-run="Mon Nov 16 05:53:06 2020" exec-time="28ms" queue-time="0ms" rc=0 (ok)
      * (13) monitor: interval="10000ms" last-rc-change="Mon Nov 16 05:53:06 2020" exec-time="16ms" queue-time="0ms" rc=0 (ok)

Others

The role also offers the posibility to use a different/internal ip as follows:

group_vars:
  postgres_use_internal_ip: true
host_vars:
  node1:
    postgres_role: master
    internal_ip: 10.0.0.1
  node2:
    postgres_role: slave
    internal_ip: 10.0.0.2
  node3:
    postgres_role: slave
    internal_ip: 10.0.0.3

TODO

  • integrate barman
  • add witness node configuration

Testing

Molecule with docker is being used with 2 scenarios:

  • default - non-repmgr
  • repmgr
  • repmgr + corosync/pacemaker

Running the tests:

pipenv install
pipenv run molecule test

# or test individual scenario
# standard replication
pipenv run molecule test -s default
# repmgr managed replication and failover
pipenv run molecule test -s repmgr
# repmgr with corosync/pacemaker
pipenv run molecule test -s repmgrpmk

License

MIT License

The code contains the iptables_raw ansible module which is also licensed under MIT License.