Pigsty

Battery-Included Open-Source PostgreSQL Distribution

Latest Version: v1.0.1 | Public Demo

Documentation: En Docs | 中文文档 | Github Pages

Run make doc to serve EN Docs & ZH Docs on your localhost

Pigsty is the battery-included open-source PostgreSQL distribution
Pigsty is a powerful & professional PostgreSQL monitoring system
Pigsty is a handy & reliable provisioning solution for HA Postgres clusters
Pigsty is a versatile local sandbox for demo, dev, test, data analysis and visualization

Pigsty can be used both for large-scale pg clusters management in real-world prod-env, and for launching versatile local pgsql sandbox for dev, test, demo & data analysis purpose.

Highlights

Battery-Included : deliver all you need to run production-grade databases with one-click.
Monitoring System based on prometheus & grafana & pg_exporter
Provisioning Solution based on ansible in kubernetes style. scale at ease.
HA Architecture based on patroni and haproxy. Self-healing and auto-failover in seconds
Service Discovery and leader election based on consul (or etcd), maintenance made easy.
Offline Installation without Internet access. Fast, secure, and reliable.
Flexible Design makes pigsty fully configurable & customizable & extensible.
Reliable Performance verified in real-world production env (200+nodes, 1PB Data)
Open Source under Apache License 2.0

Distributions

Distribution refers to the overall solution consisting of a kernel and peripheral software packages. For example, Linux is an OS kernel, while RedHat, Debian, and SUSE are OS distributions based on Linux kernel.

Pigsty is an entire solution for using postgres in your production environment. It will setup everything for your with one-click: creating & scaling clusters, switchover & auto failover. manage databases, users, roles, hbas, schemas, hbas with configuration, connection pooling, load balancing, monitoring & logging & alerting, service discovery, etc...

Monitoring

PostgreSQL is the most advanced open source relational database, but its ecosystem lacks a open source monitoring system which is good enough. Pigsty aims to solve this by delivering the best Open Source Monitoring Solution for PostgreSQL.

Pigsty comes with a professional-grade PostgreSQL monitoring system which is specially designed for large-scale postgres cluster management. Including 1200+ metrics, 20+ Dashboards, thousands of panels which covering detailed information from the biggest overview to the smallest individual object. Which brings irreplaceable value for professional users.

Pigsty consists of three monitor apps: pgsql, which focus on time-series metrics. pgcat, which explores database catalog directly. And pglog, which collect realtime logs from postgres, patroni and pgbouncer, and visualize csvlog samples. More will come later.

Pigsty is build upon popular open source components such as Prometheus & Grafana. There's no vendor locking, and the infra can be easily reused for other purpose.

Provisioning

PostgreSQL cluster comes before monitoring system. That's why pigsty is shipping with a handy Provisioning Solution. It allows you to create, update, scale, and manage your postgres cluster in kubernetes style: Describe what your want and pigsty will just do that for ya.

For example, creating a one-leader-with-two-replica cluster pg-test on three nodes requires only a few lines of configuration, and one command pgsql.yml -l pg-test to instantiate it.

pg-test:
  hosts:
    10.10.10.11: {pg_seq: 1, pg_role: primary}
    10.10.10.12: {pg_seq: 2, pg_role: replica}
    10.10.10.13: {pg_seq: 3, pg_role: replica}
  vars: 
    pg_cluster: pg-test
    vip_address: 10.10.10.3

Full Example of Cluster Customization

#----------------------------------#
# cluster: pg-test (3-node)        #
#----------------------------------#
pg-meta:                                # required, ansible group name , pgsql cluster name. should be unique among environment
  hosts:                                # `<cluster>.hosts` holds instances definition of this cluster
    10.10.10.11: {pg_seq: 1, pg_role: primary}   # primary instance, leader of cluster
    10.10.10.12: {pg_seq: 2, pg_role: replica}   # replica instance, follower of leader
    10.10.10.13: {pg_seq: 3, pg_role: offline}   # offline instance, replica that allow offline access

    #---------------
    # mandatory                         # all configuration above (`ip`, `pg_seq`, `pg_role`) and `pg_cluster` are mandatory
    #---------------
  vars:                                 # `<cluster>.vars` holds CLUSTER LEVEL CONFIG of this pgsql cluster
    pg_cluster: pg-meta                 # required, pgsql cluster name, unique among cluster, used as namespace of cluster resources

    #---------------
    # optional                          # all configuration below are OPTIONAL for a pgsql cluster (Overwrite global default)
    #---------------
    pg_version: 13                      # pgsql version to be installed (use global version if missing)
    node_tune: tiny                     # node optimization profile: {oltp|olap|crit|tiny}, use tiny for vm sandbox
    pg_conf: tiny.yml                   # pgsql template:  {oltp|olap|crit|tiny}, use tiny for sandbox
    patroni_mode: pause                 # entering patroni pause mode after bootstrap  {default|pause|remove}
    patroni_watchdog_mode: off          # disable patroni watchdog on meta node        {off|require|automatic}
    pg_lc_ctype: en_US.UTF8             # use en_US.UTF8 locale for i18n char support  (required by `pg_trgm`)

    #---------------
    # biz databases                     # Defining Business Databases (Optional)
    #---------------
    pg_databases:                       # define business databases on this cluster, array of database definition
      # define the default `meta` database
      - name: meta                      # required, `name` is the only mandatory field of a database definition
        baseline: cmdb.sql              # optional, database sql baseline path, (relative path among ansible search path, e.g files/)
        # owner: postgres               # optional, database owner, postgres by default
        # template: template1           # optional, which template to use, template1 by default
        # encoding: UTF8                # optional, database encoding, UTF8 by default. (MUST same as template database)
        # locale: C                     # optional, database locale, C by default.  (MUST same as template database)
        # lc_collate: C                 # optional, database collate, C by default. (MUST same as template database)
        # lc_ctype: C                   # optional, database ctype, C by default.   (MUST same as template database)
        # tablespace: pg_default        # optional, default tablespace, 'pg_default' by default.
        # allowconn: true               # optional, allow connection, true by default. false will disable connect at all
        # revokeconn: false             # optional, revoke public connection privilege. false by default. (leave connect with grant option to owner)
        # pgbouncer: true               # optional, add this database to pgbouncer database list? true by default
        comment: pigsty meta database   # optional, comment string for this database
        connlimit: -1                   # optional, database connection limit, default -1 disable limit
        schemas: [pigsty]               # optional, additional schemas to be created, array of schema names
        extensions:                     # optional, additional extensions to be installed: array of schema definition `{name,schema}`
          - {name: adminpack, schema: pg_catalog}    # install adminpack to pg_catalog and install postgis to public
          - {name: postgis, schema: public}          # if schema is omitted, extension will be installed according to search_path.

      # define an additional database named grafana & prometheus (optional)
      - { name: grafana,    owner: dbuser_grafana    , revokeconn: true , comment: grafana    primary database }
      - { name: prometheus, owner: dbuser_prometheus , revokeconn: true , comment: prometheus primary database }

    #---------------
    # biz users                         # Defining Business Users (Optional)
    #---------------
    pg_users:                           # define business users/roles on this cluster, array of user definition
      # define admin user for meta database (This user are used for pigsty app deployment by default)
      - name: dbuser_meta               # required, `name` is the only mandatory field of a user definition
        password: md5d3d10d8cad606308bdb180148bf663e1  # md5 salted password of 'DBUser.Meta'
        # optional, plain text and md5 password are both acceptable (prefixed with `md5`)
        login: true                     # optional, can login, true by default  (new biz ROLE should be false)
        superuser: false                # optional, is superuser? false by default
        createdb: false                 # optional, can create database? false by default
        createrole: false               # optional, can create role? false by default
        inherit: true                   # optional, can this role use inherited privileges? true by default
        replication: false              # optional, can this role do replication? false by default
        bypassrls: false                # optional, can this role bypass row level security? false by default
        pgbouncer: true                 # optional, add this user to pgbouncer user-list? false by default (production user should be true explicitly)
        connlimit: -1                   # optional, user connection limit, default -1 disable limit
        expire_in: 3650                 # optional, now + n days when this role is expired (OVERWRITE expire_at)
        expire_at: '2030-12-31'         # optional, YYYY-MM-DD 'timestamp' when this role is expired  (OVERWRITTEN by expire_in)
        comment: pigsty admin user      # optional, comment string for this user/role
        roles: [dbrole_admin]           # optional, belonged roles. default roles are: dbrole_{admin,readonly,readwrite,offline}
        parameters: {}                  # optional, role level parameters with `ALTER ROLE SET`
        # search_path: public         # key value config parameters according to postgresql documentation (e.g: use pigsty as default search_path)
      - {name: dbuser_view , password: DBUser.Viewer  ,pgbouncer: true ,roles: [dbrole_readonly], comment: read-only viewer for meta database}

      # define additional business users for prometheus & grafana (optional)
      - {name: dbuser_grafana    , password: DBUser.Grafana    ,pgbouncer: true ,roles: [dbrole_admin], comment: admin user for grafana database }
      - {name: dbuser_prometheus , password: DBUser.Prometheus ,pgbouncer: true ,roles: [dbrole_admin], comment: admin user for prometheus database }

HA Clusters

The clusters created by Pigsty are distributed HA postgres database cluster powered by Patroni & HAProxy. As long as any instance in the cluster survives, the cluster serves. Each instance is idempotent from application's point of view.

Sandbox

Pigsty is designed for real world production env with hundreds of high spec nodes, but it can also run inside a tiny 1C|1GB vm node. Which is great for developing, testing, demonstrating, data analysing & visualizing and other purposes.

Pigsty sandbox can be pulled up with one command on your Macbook, powered by virtualbox & vagrant. There are two specs of sandbox: 1 node (the default) and 4 node (full sandbox)

Sandbox Specification

System Requirement

CentOS 7 / Red Hat 7 / Oracle Linux 7 or equivalent
CentOS 7.8.2003 x86_64 is highly recommend (fully tested under production)

Minimal Spec

Self-contained, single meta node, singleton pgsql cluster pg-meta
Minimal requirement: 1 CPU Core & 2 GB RAM

Demo setup ( TINY mode, vagrant demo)

4 Node, including single meta node, singleton database cluster pg-meta and 3-instances pgsql cluster pg-test
Spec: 2Core/4GB for meta controller node, 1Core/1GB for database node (x3)

Production setup (OLTP/OLAP/CRIT mode)

3 meta nodes , up to 100+ database clusters or 400~500 nodes
Verified Spec: Dell R740 / 64 Core / 400GB Mem / 3TB PCI-E SSD

Analysis

Pigsty ships with handy tools such as Jupyterlab, PostgreSQL, Grafana, Echarts. Which is great for data analysis & visualization. You can turn pigsty sandbox into an IDE for making data-intensive applications and demos: Processing data with SQL & Python, Visualize with Grafana & Echarts.

Pigsty comes with two example apps: covid for covid-19 data visualization, and isd for visualizing global surface weather station data.

Get Started

If you are interested in Pigsty, check Getting-Started: Introduction for what's next.

It takes 3 commands to pull up pigsty: download, configure, install

Get a fresh Linux x86_64 CentOS 7.8 node. with nopass sudo & ssh access, then:

# curl -SL https://github.com/Vonng/pigsty/releases/download/v1.0.1/pigsty.tgz -o ~/pigsty.tgz  
# curl -SL https://github.com/Vonng/pigsty/releases/download/v1.0.1/pkg.tgz    -o /tmp/pkg.tgz
git clone https://github.com/Vonng/pigsty && cd pigsty
./configure
make install

Check Quick Start for detail.

If you don't have vm nodes, considering launch pigsty sandbox on your Macbook:

make deps      # Install MacOS deps with homebrew
make dns       # Write static DNS
make start     # Pull-up vm nodes and setup ssh access  (start4 for 4-node demo)
make demo      # install pigsty on 'meta' as above      (demo4  for 4-node demo)

License

Apache Apache License Version 2.0

About

Pigsty (/ˈpɪɡˌstaɪ/) is the abbreviation of "PostgreSQL In Graphic STYle"

Author: Vonng (rh@vonng.com)

License: Apache 2.0 License

smartree/pigsty