stolon - PostgreSQL cloud native HA replication manager

stolon is a cloud native PostgreSQL manager for PostgreSQL high availability. It's cloud native because it'll let you keep an high available PostgreSQL inside your containers (kubernetes integration) but also on every other kind of infrastructure (cloud IaaS, old style infrastructures etc...)

For an introduction to stolon you can also take a look at this post

Features

Leverages PostgreSQL streaming replication.
Resilient to any kind of partitioning. While trying to keep the maximum availability, it prefers consistency over availability.
kubernetes integration letting you achieve postgreSQL high availability.
Uses a cluster store like etcd or consul as an high available data store and for leader election
Asynchronous (default) and synchronous replication.
Full cluster setup in minutes.
Easy cluster admininistration
Automatic service discovery and dynamic reconfiguration (handles postgres and stolon processes changing their addresses).
Can use pg_rewind for fast instance resyncronization with current master.

Architecture

Stolon is composed of 3 main components

keeper: it manages a PostgreSQL instance converging to the clusterview provided by the sentinel(s).
sentinel: it discovers and monitors keepers and calculates the optimal clusterview.
proxy: the client's access point. It enforce connections to the right PostgreSQL master and forcibly closes connections to unelected masters.

Project Status

Stolon is under active development and used in different environments. Probably its on disk format (store hierarchy and key contents) will change in future to support new features. If a breaking change is needed it'll be documented in the release notes and an upgrade path will be provided.

Anyway it's quite easy to reset a cluster from scratch keeping the current master instance working and without losing any data.

Requirements

PostgreSQL >= 9.4
etcd >= 2.0 or consul >=0.6

build

./build

Quick start and examples

Documentation

High availability

Stolon tries to be resilient to any partitioning problem. The cluster view is computed by the leader sentinel and is useful to avoid data loss (one example over all avoid that old dead masters coming back are elected as the new master).

There can be tons of different partitioning cases. The primary ones are covered (and in future more will be added) by various integration tests

FAQ

Why clients should use the stolon proxy?

Since stolon by default leverages consistency over availability, there's the need for the clients to be connected to the current cluster elected master and be disconnected to unelected ones. For example, if you are connected to the current elected master and subsequently the cluster (for any valid reason, like network partitioning) elects a new master, to achieve consistency, the client needs to be disconnected from the old master (or it'll write data to it that will be lost when it resyncs). This is the purpose of the stolon proxy.

Why didn't you use an already existing proxy like haproxy?

For our need to forcibly close connections to unelected masters and handle keepers/sentinel that can come and go and change their addresses we implemented a dedicated proxy that's directly reading it's state from the store. Thanks to go goroutines it's very fast.

We are open to alternative solutions (PRs are welcome) like using haproxy if they can met the above requirements. For example, an hypothetical haproxy based proxy needs a way to work with changing ip addresses, get the current cluster information and being able to forcibly close a connection when an haproxy backend is marked as failed (as a note, to achieve the latter, a possible solution that needs testing will be to use the on-marked-down shutdown-sessions haproxy server option).

Contributing to stolon

stolon is an open source project under the Apache 2.0 license, and contributions are gladly welcomed!

wied03/stolon