Deploy a Production Ready PostgreSQL High-Availability Cluster (based on "Patroni" and "DCS(etcd)"). Automating with Ansible.
This Ansible playbook is designed for deploying a PostgreSQL high availability cluster on dedicated physical servers for a production environment. Сluster can be deployed in virtual machines for test environments and small projects.
This playbook support the deployment of cluster over already existing and running PostgreSQL. You must specify the variable postgresql_exists='true'
in the inventory file.
Attention! Your PostgreSQL will be stopped before running in cluster mode. You must planing downtime of existing databases.
❗ Please test it in your test enviroment before using in a production.
You have two options available for deployment ("Type A" and "Type B"):
To use this scheme, specify
with_haproxy_load_balancing: true
in variable file vars/main.yml
This scheme provides the ability to distribute the load on reading. This also allows us to scale out the cluster (with read-only replicas).
- port 5000 (read / write) master
- port 5001 (read only) all replicas
- port 5002 (read only) synchronous replica only
- port 5003 (read only) asynchronous replicas only
❗ Your application must have support sending read requests to a custom port (ex 5001), and write requests (ex 5000).
Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Used for automate the management of PostgreSQL instances and auto failover.
etcd is a distributed reliable key-value store for the most critical data of a distributed system. etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log. It is used by Patroni to store information about the status of the cluster and PostgreSQL configuration parameters.
What is Distributed Consensus?
HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications.
confd manage local application configuration files using templates and data from etcd or consul. Used to automate HAProxy configuration file management.
Keepalived provides a virtual high-available IP address (VIP) and single entry point for databases access. Implementing VRRP (Virtual Router Redundancy Protocol) for Linux. In our configuration keepalived checks the status of the HAProxy service and in case of a failure delegates the VIP to another server in the cluster.
PgBouncer is a connection pooler for PostgreSQL.
This is simple scheme without load balancing Used by default
To provide a single entry point (VIP) for databases access is used "vip-manager".
vip-manager is a service that gets started on all cluster nodes and connects to the DCS. If the local node owns the leader-key, vip-manager starts the configured VIP. In case of a failover, vip-manager removes the VIP on the old leader and the corresponding service on the new leader starts it there.
Written in Go. Cybertec Schönig & Schönig GmbH https://www.cybertec-postgresql.com
RedHat and Debian based distros (x86_64)
- CentOS: 7
- Ubuntu: 16.04
- Debian: 9
✅ tested, works fine: Debian 9/10, Ubuntu 18.04/20.04, CentOS 7.x/8.x
all supported PostgreSQL versions
✅ tested, works fine: PostgreSQL 9.6, 10, 11, 12
Table of results of daily automated testing of cluster deployment:
Distribution | Test result |
---|---|
CentOS 7 | |
CentOS 8 | |
Debian 9 | |
Debian 10 | |
Ubuntu 18.04 | |
Ubuntu 20.04 |
This has been tested on Ansible 2.7.х, 2.8.х, 2.9.х
This playbook requires root privileges or sudo.
Ansible (What is Ansible?)
- linux (Operation System):
Update your operating system on your target servers before deploying;
Make sure you have time synchronization is configured (NTP).
Specify ntp_enabled:'true'
and ntp_servers
if you want to install and configure the ntp service.
- DCS (Distributed Configuration Store):
Fast drives and a reliable network are the most important factors for the performance and stability of an etcd cluster.
Avoid storing etcd data on the same drive along with other processes (such as the database) that are intensively using the resources of the disk subsystem!
Store the etcd and postgresql data on different disks (see etcd_data_dir
variable), use ssd drives if possible.
See hardware recommendations and tuning guides.
Overloaded (highload) database clusters may require the installation of the etcd cluster on dedicated servers, separate from the database servers.
- Placement of cluster members in different data centers:
If you’d prefer a cross-data center setup, where the replicating databases are located in different data centers, etcd member placement becomes critical.
There are quite a lot of things to consider if you want to create a really robust etcd cluster, but there is one rule: do not placing all etcd members in your primary data center. See some examples.
- How to prevent data loss in case of autofailover (synchronous_modes and pg_rewind):
Due to performance reasons, a synchronous replication is disabled by default.
To minimize the risk of losing data on autofailover, you can configure settings in the following way:
- synchronous_mode: 'true'
- synchronous_mode_strict: 'true'
- synchronous_commit: 'on' (or 'remote_write'/'remote_apply')
- use_pg_rewind: 'false' (enabled by default)
- Install Ansible to the managed machine
Example: install latest release using pip
sudo apt install python3-pip sshpass git -y
sudo pip3 install ansible
- Download or clone this repository
git clone https://github.com/vitabaks/postgresql_cluster.git
- Go to the playbook directory
cd postgresql_cluster/
- Edit the inventory file
Specify the ip addresses and connection settings (ansible_user
, ansible_ssh_pass
...) for your environment
vim inventory
- Edit the variable file vars/main.yml
vim vars/main.yml
proxy_env
# if required (for download packages)
example:
proxy_env:
http_proxy: http://proxy_server_ip:port
https_proxy: http://proxy_server_ip:port
cluster_vip
# for client access to databases in the cluster (optional)patroni_cluster_name
with_haproxy_load_balancing
'true'
(Type A) or'false'
/default (Type B)postgresql_version
postgresql_data_dir
- Run playbook:
ansible-playbook deploy_pgcluster.yml
See the vars/main.yml, system.yml and (Debian.yml or RedHat.yml) files for more details.
Add new postgresql node to existing cluster
Click here to expand...
After you successfully deployed your PostgreSQL HA cluster, you may need to scale it further.
Use the add_pgnode.yml
playbook for this.
❕ This playbook does not scaling the etcd cluster and haproxy balancers.
During the run this playbook, the new nodes will be prepared in the same way as when first deployment the cluster. But unlike the initial deployment, all the necessary configuration files will be copied from the master server.
- Add a new node (or subnet) to the
pg_hba.conf
file on all nodes in your cluster - Apply pg_hba.conf for all PostgreSQL (see
patronictl reload --help
)
- Go to the playbook directory
- Edit the inventory file
Specify the ip address of one of the nodes of the cluster in the [master] group, and the new node (which you want to add) in the [replica] group.
- Edit the variable files
Variables that should be the same on all cluster nodes:
with_haproxy_load_balancing
, postgresql_version
, postgresql_data_dir
, postgresql_conf_dir
.
- Run playbook:
ansible-playbook add_pgnode.yml
Add new haproxy balancer node
Click here to expand...
Use the add_balancer.yml
playbook for this.
During the run this playbook, the new balancer node will be prepared in the same way as when first deployment the cluster. But unlike the initial deployment, all necessary configuration files will be copied from the server specified in the [master] group.
❗ Please test it in your test enviroment before using in a production.
-
Go to the playbook directory
-
Edit the inventory file
Specify the ip address of one of the existing balancer nodes in the [master] group, and the new balancer node (which you want to add) in the [balancers] group.
❗ Attention! The list of Firewall ports is determined dynamically based on the group in which the host is specified.
If you adding a new haproxy balancer node to one of the existing nodes from the [etcd_cluster] or [master]/[replica] groups, you can rewrite the iptables rules!
See firewall_allowed_tcp_ports_for.balancers variable in the system.yml file.
- Edit the
main.yml
variable file
Specify with_haproxy_load_balancing: true
- Run playbook:
ansible-playbook add_balancer.yml
Create new clusters from your existing backups with pgBackRest or WAL-G
Point-In-Time-Recovery
Click here to expand...
- Edit the
main.yml
variable file
patroni_cluster_bootstrap_method: "pgbackrest"
patroni_create_replica_methods:
- pgbackrest
- basebackup
postgresql_restore_command: "pgbackrest --stanza={{ pgbackrest_stanza }} archive-get %f %p"
pgbackrest_install: true
pgbackrest_stanza: "stanza_name" # specify your --stanza
pgbackrest_repo_type: "posix" # or "s3"
pgbackrest_repo_host: "ip-address" # dedicated repository host (if repo_type: "posix")
pgbackrest_repo_user: "postgres" # if "repo_host" is set
pgbackrest_conf: # see more options https://pgbackrest.org/configuration.html
global: # [global] section
- {option: "xxxxxxx", value: "xxxxxxx"}
...
stanza: # [stanza_name] section
- {option: "xxxxxxx", value: "xxxxxxx"}
...
pgbackrest_patroni_cluster_restore_command:
'/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} --type=time "--target=2020-06-01 11:00:00+03" --delta restore'
example for S3 vitabaks#40 (comment)
- Run playbook:
ansible-playbook deploy_pgcluster.yml
- Edit the
main.yml
variable file
patroni_cluster_bootstrap_method: "wal-g"
patroni_create_replica_methods:
- wal_g
- basebackup
postgresql_restore_command: "wal-g wal-fetch %f %p"
wal_g_install: true
wal_g_ver: "v0.2.15" # version to install
wal_g_json: # see more options https://github.com/wal-g/wal-g#configuration
- {option: "xxxxxxx", value: "xxxxxxx"}
- {option: "xxxxxxx", value: "xxxxxxx"}
...
- Run playbook:
ansible-playbook deploy_pgcluster.yml
You can run automatic restore of your existing patroni cluster
for PITR, specify the required parameters in the main.yml variable file and run the playbook with the tag:
ansible-playbook deploy_pgcluster.yml --tags point_in_time_recovery
Recovery steps with pgBackRest:
1. Stop patroni service on the Replica servers (if running);
2. Stop patroni service on the Master server;
3. Remove patroni cluster "xxxxxxx" from DCS (if exist);
4. Run "/usr/bin/pgbackrest --stanza=xxxxxxx --delta restore" on Master;
5. Run "/usr/bin/pgbackrest --stanza=xxxxxxx --delta restore" on Replica (if patroni_create_replica_methods: "pgbackrest");
6. Waiting for restore from backup (timeout 24 hours);
7. Start PostgreSQL for Recovery (master and replicas);
8. Waiting for PostgreSQL Recovery to complete (WAL apply);
9. Stop PostgreSQL instance (if running);
10. Disable PostgreSQL archive_command (if enabled);
11. Start patroni service on the Master server;
12. Check PostgreSQL is started and accepting connections on Master;
13. Make sure the postgresql users (superuser and replication) are present, and password does not differ from the specified in vars/main.yml;
14. Update postgresql authentication parameter in patroni.yml (if superuser or replication users is changed);
15. Reload patroni service (if patroni.yml is updated);
16. Start patroni service on Replica servers;
17. Check that the patroni is healthy on the replica server (timeout 10 hours);
18. Check postgresql cluster health (finish).
Why disable archive_command?
This is necessary to avoid conflicts in the archived log storage when archiving WALs. When multiple clusters try to send WALs to the same storage.
For example, when you make multiple clones of a cluster from one backup.
You can change this parameter using patronictl edit-config
after restore.
Or set disable_archive_command: false
to not disable archive_command after restore.
Please note that the original design goal of this playbook was more concerned with the initial deploiment of a PostgreSQL HA Cluster and so it does not currently concern itself with performing ongoing maintenance of a cluster.
You should learn each component of the cluster for its further maintenance.
- Tutorial: Management of High-Availability PostgreSQL clusters with Patroni
- Patroni documentation
- etcd operations guide
A high availability cluster provides an automatic failover mechanism, and does not cover all disaster recovery scenarios. You must take care of backing up your data yourself.
Patroni nodes are dumping the state of the DCS options to disk upon for every change of the configuration into the file patroni.dynamic.json located in the Postgres data directory. The master (patroni leader) is allowed to restore these options from the on-disk dump if these are completely absent from the DCS or if they are invalid.
However, I recommend that you read the disaster recovery guide for the etcd cluster:
I can recommend the following backup and restore tools:
Do not forget to validate your backups (for example pgbackrest auto).
Licensed under the MIT License. See the LICENSE file for details.
Vitaliy Kukharik (PostgreSQL DBA) vitabaks@gmail.com
Are welcome!