At PeerDB, we are building a fast, simple and the most cost effective way to stream data from Postgres to a host of Data Warehouses, Queues and Storage Engines. If you are running Postgres at the heart of your data-stack and move data at scale from Postgres to any of the above targets, PeerDB can provide value.
PeerDB was acquired by ClickHouse in July 2024. As part of this acquisition, we're making public the repository that contains the Helm charts used for deploying our Enterprise offering. This will enable people to self-host PeerDB in a more reliable and scaleable manner.
PeerDB itself has 5 main services:
flow-worker
: The service that actually runs mirrors and does all the data movement. Written in Golang, source code here.flow-snapshot-worker
: Helpsflow-worker
perform initial snapshot of mirrors. Needs to be available at all times during this phase of a mirror. Shares source code withflow-worker
.flow-api
: Hosts the gRPC API that actually creates and manages mirrors.peerdb-ui
andpeerdb-server
depend on this. Shares source code withflow-worker
andflow-snapshot-worker
.peerdb-ui
: Intuitive web UI for interacting with peers and mirrors. Written in Next.js, source code here.peerdb-server
: Postgres wire protocol compatible SQL query layer, allows creating peers and mirrors viapsql
and other Postgres tooling. Written in Rust, source code here.
For a more detailed overview of PeerDB's architecture, you can look here. Aside from this, PeerDB needs a Postgres database to use as a "catalog" to store configuration, and Temporal for workflow orchestration. Both can either be cloud-based or self-hosted (self-hosted Temporal in turn needs Postgres too), and the charts can be configured according to your needs.
The sections below provide a quick way to get started with using the charts (like a POC). You can jump to the Production Guide post POC (or if you are comfortable enough).
- helm
- kubectl
- yq
- Golang (if you need to setup catalog manually)
- k9s for debugging
psql
if you need to interface withpeerdb-server
- Create a Kubernetes cluster on your favorite cloud provider
- A sample node-pool/node-group for following the quickstart guide can look like:
- Number of nodes: 3 (autoscaling recommended)
- vCores: 8
- Memory: 32GB
- Disk: 300GB
- Architecture: x64/ARM64
- Setup your kubectl to point to the cluster
- Make sure all local dependencies are installed
- Make sure Cluster is setup and kubectl is pointing to the cluster
- Clone this repo and create an
.env
file from.env.template
. - Setup In-Cluster Catalog Postgres
- Run
./install_catalog.sh
- Run
./test_catalog.sh
- Run
- Install PeerDB
- Update
.env
withPEERDB_PASSWORD
andPEERDB_UI_PASSWORD
- Also generate a new random string for
PEERDB_UI_NEXTAUTH_SECRET
and set it in.env
- Also generate a new random string for
- Run
./install_peerdb.sh
for the first time - Set
PEERDB_UI_SERVICE_URL
in.env
to the DNS/CNAME/IP of the LoadBalancer created and re-run./install_peerdb.sh
kubectl get service peerdb-ui -n peerdb-ns
to get the external IP of the peerdb server, to get theexternal_ip
of the PeerDB UI server. (Change the namespace here if you have set a different namespace)- Set the value to
PEERDB_UI_SERVICE_URL
in.env
ashttp://<external_ip>:3000
- Re-run
./install_peerdb.sh
to update the service with the new DNS/CNAME/IP
- Update
Specific changes can be made to values.customer.yaml
for both the peerdb
and the peerdb-catalog
helm charts.
values.customer.yaml
can be backed up as kubernetes secrets. To enable this, set SAVE_VALUES_AS_SECRET=true
in the .env
- Deploy postgres as needed.
- Update
.env
appropriately with the credentials - Set
CATALOG_DEPLOY_ENABLED=false
in.env
-
- If using RDS, enable SSL by setting
PG_RDS_SSL_ENABLED=true
in.env
. - If using SSL with another provider, set
TEMPORAL_SSL_MODE=true
in.env
.
- If using RDS, enable SSL by setting
- Run
./install_catalog.sh
, this will setup the schema. - Run
./test_catalog.sh
to verify schema version and permissions are in order
- Set
CATALOG_DEPLOY_ENABLED=true
in.env
- Run
./install_catalog.sh
- Run
./test_catalog.sh
to verify schema version and permissions are in order once the postgres pods are up
NOTE: PG_PASSWORD
will NOT be used from .env
and will be auto-generated and can be obtained from the secret "${CATALOG_DEPLOY_CLUSTER_NAME}-pguser-${PG_USER}"
- Set
DATADOG_ENABLED=true
- Set the following parameters:
DATADOG_SITE=<Datadog collection site, e.g. us5.datadoghq.com> DATADOG_API_KEY=<Datadog API Key> DATADOG_CLUSTER_NAME=<Datadog Cluster Name, something like customer-name-enterprise >
The following can be set in the .env
to set up credentials to access PeerDB
PEERDB_PASSWORD=peerdb
PEERDB_UI_PASSWORD=peerdb
Also set PEERDB_UI_NEXTAUTH_SECRET
to a random static string
PEERDB_UI_NEXTAUTH_SECRET=<Randomly-Generated-Secret-String>
- Authentication for PeerDB UI and Temporal WebUI can be enabled by setting the following in
.env
:This will disableAUTHENTICATION_ENABLED=true AUTHENTICATION_CREDENTIALS_USERNAME=<username> AUTHENTICATION_CREDENTIALS_PASSWORD=<password>
LoadBalancer
for both the services and instead create a LoadBalancer for the Authentication Proxy. - Once Temporal and PeerDB are installed in the cluster, set/update DNS entries starting with
temporal.
,peerdb.
andpeerdb-ui.
to point to theLoadBalancer
IP ofauthentication-proxy
service. - Temporal and PeerDB UI can be accessed through the DNS names set in previous step.
Catalog will automatically be setup (with schema update/migration) using k8s jobs via the helm chart. The jobs might go into a few retries before everything reconciles.
NOTE: Catalog can still be setup/upgraded via ./setup_postgres.sh
and ./setup_temporal_schema.sh
in case there is an issue.
- Fill in the
TEMPORAL_CLOUD_HOST
,TEMPORAL_CLOUD_CERT
andTEMPORAL_CLOUD_KEY
environment variables in .env. - Fill in
PEERDB_DEPLOYMENT_UID
with an appropriate string to uniquely identify the current deployment.
- Run
./install_peerdb.sh
to install/upgrade peerdb on the kubernetes cluster. - Run
kubectl get service peerdb-server -n ${PEERDB_K8S_NAMESPACE}
to get the external IP of the peerdb server. - Validate that you are able to access temporal-web by:
kubectl port-forward -n ${TEMPORAL_K8S_NAMESPACE} services/${TEMPORAL_RELEASE_NAME}-web 8080:8080
- If enabling service of type LoadBalancer, set
PEERDB_UI_SERVICE_URL
in.env
to the DNS/CNAME/IP of the LoadBalancer forpeerdb-ui
service created and re-run./install_peerdb.sh
. For example:PEERDB_UI_SERVICE_URL=http://aac397508d3594a4494dc9350812c40d-509756028.us-east-1.elb.amazonaws.com:3000
Setting up resources for PeerDB and In-Cluster Catalog is as simple as updating the values.customer.yaml
file in the respective charts (peerdb
and peerdb-catalog
).
peerdb/values.customer.yaml
:flowWorker: resources: requests: cpu: 12 memory: 48Gi ephemeral-storage: 384Gi limits: cpu: 16 memory: 64Gi ephemeral-storage: 512Gi replicaCount: 2
- and
peerdb-catalog/values.customer.yaml
:deploy: resources: requests: cpu: 2 memory: 8Gi limits: cpu: 2 memory: 8Gi
A production guide setup with examples is available in PRODUCTION.md
.
Insecure cookie needs to be enabled to send commands/signals via the Temporal UI over plain HTTP and can be added to peerdb/values.customer.yaml
:
temporal-deploy:
web:
additionalEnv:
- name: TEMPORAL_CSRF_COOKIE_INSECURE
value: 'true'