This repository contains ansible playbook to provision and manage PostgreSQL database in Ubuntu 18.04.
This is list of recommended best practices from PostgreSQL expert. This best practices that I wrote is only relevant for Ubuntu 18.04 64-bit.
All configurations are set in /etc/postgresql/12/main/postgresql.conf.
-
Don't do OLAP (Online Analytical Processing) where you OLTP (Online Transactional Processing).
Slow query in specific table will cause lock queue, any other query for that table will wait and slow down.
It's recommended to create read-replica or build the data pipeline outside postgresql database.
OLAP is where you do the agregation (SUM, COUNT, etc) from large of rows. OLTP is where you insert/update/get few rows in real time.
-
Set
listen_address = "*"and then control who can and cannot connect via thepg_hba.conffile. -
Set
max_connectionsbased on node RAM resources. To many client connections will use your RAM. -
Set
shared_buffersto 25% of node RAM. -
Set
effective_cache_sizeto50%of node RAM. -
Set
default_statistics_targetto 100 -
Set
work_mem=(50% * total_ram)/max_connectionsfor starter. Make suremax_connectionstimeswork_memis not larger than 50% of total RAM. -
log_temp_filescan be used to log sorts, hashes, and temp files which can be useful in figuring out if sorts are spilling to disk instead of fitting in memory. Increasework_memif this log exists. -
Set
maintenance_work_memto 64MB -
Disable
autovacummwhen loading bulk of data.
Identifying slow queries and fixing them:
- Set
log_checkpoints = onto tell postgresql to log checkpoints activities. Query slow maybe the query happen when postgresql writing all the data to the checkpoints. - Set
log_connections = onandlog_disconnections = on. If we have so many short-lived connection is very bad for performance. - Set
log_lock_waits = onto make sure that slow query is not because deadlock. - Set
log_min_duration=300to tell postgresql to log all queries that take more than 300ms - Set
log_temp_files = 0to find if there is any query that use temporary files. Query that use temporary files tend to be slow.
References:
- Chris Sinjakli : Lessons Learned the Hard Way / Postgres in Production at GoCardless - PGDayUK'16
- Tuning Your PostgreSQL Server
- Postgres Open 2016 - Identifying Slow Queries and Fixing Them!
Clone this repository:
git clone https://github.com/pyk/postgresql-ansible.git
cd postgresql-ansible/
Create new python environment:
python3 -m venv venv
Activate the environment:
source venv/bin/activate
Install the dependencies:
pip install -r requirements.txt
Create new hosts file with the following content:
[nodes]
NODE_NAME ansible_host=IP_ADDRESS ansible_user=root ansible_python_interpreter=/usr/bin/python3
Run the following command to provision new PostgreSQL database:
ansible-playbook -i hosts provision.yaml
This command will provision & configure new postgresql database in specified
hosts based on the RAM of the node.
To configure the database, update the postgresql.conf and run the following
command:
ansible-playbook -i hosts setup_postgresql_conf.yaml
Becareful, this will restart your postgresql.
To create new database, create new databases.yaml with the following content:
---
databases:
- db_name: db1
user_name: user1
user_pass: pass1
- db_name: db2
user_name: user2
user_pass: pass2
Then run the following command:
ansible-playbook -i hosts setup_databases.yaml
Now you can connect the database using created user directly inside database node or node in the same subnet.
For example:
psql -h PRIVATE_IP -U streamers -d streamers
I use DigitalOcean to host my PostgreSQL database. You can use my referral to get $100 for 60 days.