PostgreSQL Data Computing Platform (PgDCP)

PgDCP is Netspective's approach for using PostgreSQL to host tables, views, plus polyglot stored routines and surface them as GraphQL and REST using Postgraphile, Hasura, and PostgREST (or pREST). Using PostgreSQL whenever possible is called our Zero Middleware automated backend as a service (AutoBaaS) strategy. AutoBaaS helps us eliminate signficant amounts of GraphQL and REST boilerplate code plus reduces the number of microservices we need to deploy and manage.

Zero Middleware is first and foremost a technology strategy but also comes with a framework of code generators and reusable components to reducing data-centric code surfaces to just PostgreSQL with auto-generated, database-first secure GraphQL and REST endpoints. For use cases when higher-performance or more secure interfaces are necessary, direct access to tables, views, and stored routines using PostgreSQL wire protocol is encouraged.

We use psql and pure-Postgres migrations as Database as Code (“DaC”) for DDL, DQL, DML, etc. when possible. Instead of relying on 3rd party dependencies for schema migrations, we use PostgreSQL-first stored routines themselves along with psql to manage idempotent migrations.

The overall objective for PgDCP is to reduce the number of tools that a developer needs to know down to just PostgreSQL and psql along with SQL and PL/* hosted languages for all services.

Fear the Movement of Data! (FMD!)

The more we move data from one system to another (e.g. traditional ETL or it's cooler cousin ELTs), the more likely we are to run into scalability problems. Almost all modern databases are pretty good at querying and are able to do optimizations to improve querying either using vertical scaling or, in advanced cases, using horizontal scaling. However, no databases are good at moving data at scale. The reason we should fear the movement of data is that moving data from one system into another or vice-versa is almost always the most time-consuming part of data computing stacks.

Whenever dealing with data at scale, bring "compute to data" rather that "moving data to the compute clusters". PgDCP as an approach tries to use FDWs to leave data in their original locations when possible and only move the specific data when it cannot be processed in its original location. If it cannot eliminate data movement from source systems, PgDCP tries to reduce data movement to only a single data move into the PgDCP environment and then uses schemas, views, materialized views, etc. to transform data virtually instead of physically.

Even when applying Unified Star Schemas (USS), Dimensional Modeling, Dault Vault 2.0, or other techniques we try to use late-binding rather than early-binding when possible.

Polyglot but SQL-first

Our first choice of languages for writing data-centric micro services should be:

Pure SQL views, materialized views, and stored routines using Postgraphile Schema Design.
pgSQL views and materialized views and PL/pgSQL stored functions and stored procedures when pure SQL is not possible, using Postgraphile Schema Design.

In case SQL or PL/pgSQL is not appropriate:

PL/Rust, PL/Java, PL/Go, PL/Deno or other type-safe PostgreSQL-hosted language should be prioritized.
If type-safety is not possible or 3rd party libraries access is more important than type-safety then PL/Python, PL/Perl, and other languages should be considered.
- When using PL/Python or other language with package managers, consider using guidance such as programmatic access to PIP modules so that PostgreSQL server admins are not required for administering module dependencies
The choice of language should depend on how easy the functionality can be expressed using Postgraphile Schema Design.

Cut Out the Middle Tier: Generating JSON Directly from Postgres is a good HN discussion about the benefits and pitfalls of the PgDCP approach.

Information Assurance and Security in the Database

PgDCP requires database-first security, which means PostgreSQL schemas, users, roles, permissions, and row-level security (RLS) should drive all data security requirements. Role-based access control (RBAC) and attribute based access control (ABAC) should be implemented in PostgreSQL stored routines. If necessary, ldap2pg can be used to synchronize roles with LDAP.

Because all our API functionality (except for serving the endpoints) will also be in the database we want to ensure that we secure views, stored procedures, and stored functions as if they were the API endpoints. OWASP API Security Project provides some great advice.

Zero Trust SQL (ztSQL) for Zero Trust Data Access (ZTDA)

Zero Trust is a generally accepted cybersecurity approach that eliminates implicit trust in favor of continuously validating each stage of digital interactions. PgDCP encourages the same “never trust, always verify,” with Zero Trust SQL (ztSQL). ztSQL is designed to protect database environments and enable faster development by allowing anyone to run any SQL but leveages row-level security, attribute-based- access-control, role-based access control, and data segmentation within the database. Granular, “least access” policies should be implemented within the database so that Zero Trust Data Access (ZTDA) is possible.

Securing Access to the Database

If all access management is in the database, then securing access to the database is paramount. To that end, see:

Deliberate and Disciplined Change Management in the Database

In PgDCP PostgreSQL is not treated as a "bit bucket" where you just store data for an application. It's the center and most important part of our services' universe and requires a deliberate, disciplined, database-first change management approach. While there are many database migrations tools like LiquiBase, Flyway, and Sqitch, they all assume that the problem should be solved in a database-independent manner outside of PostgreSQL. Since the PgDCP approach is to double-down on PostgreSQL and not worry about database portability, we want to perform PosgreSQL-first database change management.

First review Martin Fowler's Evolutionary Database Design article and then see tools like:

Zero-downtime schema migrations in Postgres using views
postgresql-dbpatch, which support conducting database changes and deploying them in a robust and automated way through the database instead of external tools
Metagration, a PostgreSQL migration tool written in PostgreSQL
Konfigraf, a Postgres extension that allows you to store and manipulate data in Git repositories stored in tables within the database. This is designed to be used for storage of configuration data for applications and services.

Integrated Observability for Health, Metrics and Traceability in the Database

Observability of the database is important for forensics and quality assurance. Try to use simplified auditing based on SQL logging and FDWs to import logs instead of writing brittle custom triggers on each table/object. Separate observability into DDL changes, which can be alerted upon, as well as DML change logs which can be used for forensics. Wrap the observability of logs into a metrics table that can be scraped by Prometheus and used for alerting (e.g. whenever DDL changes occur, send alerts to anyone who needs to be informed).

Especially important is to integrate OpenTelemetry trace IDs into each DDL and DML statement so that end to end traceability becomes native to the database. Being able to track context and propogation of SQL through service layers will be critical to maintain high quality and reliability.

See semantic conventions for database client calls for how to provide traceability for database client calls and integrate W3C's Trace Context.

Client Application-friendly Type-safety in the Database

Modern applications demand type-safety (which is why PgDCP recommends TypeScript or Rust for applications). Since applications should be type-safe, we want our data models and database objects to also be type-safe. To enhance type-safety, create custom domains, custom enumerations or lookup tables based on inheritance, business types, and inheritable transaction tables ("table types"). Once you're using table inheritance you can use table inheritance wrapper functions.

GraphQL-first but REST-capable in the Database

All micro services code in PostgreSQL tables, views, functions and stored procedures will be surfaced through Postgraphile GraphQL first but our AutoBaaS requirements are that all services should be exposed through safe and secure REST interfaces via PostgREST (or pREST) as a fallback for non-GraphQL clients. We favor Postgraphile's GraphQL API because it generates code which honors PostgreSQL security, roles, and unique features more faithfully than other utilities such as Hasura.

Friendlier and Safer Experimentation

Promote the use of Database Lab Engine (DLE) to "provision independent non-production environments with multi-terabyte PostgreSQL databases in a few seconds." Every developer should be able to have their own easy to create and teardown development instances for experimentation purposes and to make sure scripts are idempotent and repeatable. If database cannot be easily torn down and recreated in development and quality assurance environments, scripting is harder.

Part of the DLE is "Joe", which should be used by engineering and QA teams:

“Joe Bot”, a virtual DBA for SQL optimization, is a revolutionary new way to troubleshoot and optimize PostgreSQL query performance. Instead of running EXPLAIN or EXPLAIN (ANALYZE, BUFFERS) directly in production, users send queries for troubleshooting to Joe Bot. Joe Bot uses the Database Lab Engine (DLE) to:

Generate a fresh thin clone

Execute the query on the clone

Return the resulting execution plan to the user

The returned plan is identical to production in terms of structure and data volumes – this is achieved thanks to two factors: thin clones have the same data and statistics as production (at a specified point in time), and the PostgreSQL planner configuration on clones matches the production configuration.

Assurance as Code in the Database

All code in PostgreSQL should be tested, or assured, with pgTAP code. All Assurance Engineering Cases (AECs) should be written code-first, not human-first (what we call Assurance as Code). If Assurance is done within the database then tools like PostgreSQL Anonymizer should be used to help build test-first databases from production data when appropriate (see this PDF for further elaboration).

Chaos in the Database

Because our responses to bugs in the database which might lead to database crashes is only as good as the number of times we see such crashes, we should use tools like pg_crash, a "Chaos Monkey" Extension for PostgreSQL databases. Per their repo "pg_crash is an extension to PostgreSQL, which allows you to periodically or randomly crash your database infrastructure by sending kill (or other) signals to your DB processes and make them fail. It is ideal for HA and failover testing."

Microsoft Excel-first UX but client-independent

Microsoft Excel should be the first UI that all data access should be designed for when accessing outside of developer-centric PgDCP use cases. If Excel can properly show your data, in a safe, secure, and performant way, then every other client can also do so. Excel-first UX should target "live ODBC" use cases where the database is directly accessed using PostgreSQL binary protocol.

Primary Keys vs. Surrogate Keys for external consumption

Good security practice for modern apps that will allow record IDs to be shared externally is to either have UUID or shortkey (see below) non-serial primary keys. If you use a serial type primary key, never send the PK out for external consumption - always use surrogate keys via uuid-ossp or similar. If you use a serial PK and share the ID externally then it will be possible for external users to "guess" IDs since the PKs would adjacent numerically.

Real-time Information via PostgreSQL Notify

Postgres Notify for Real Time Dashboards provides a great description of how to "push" updates to clients.

PgDCP Engineering Resources

Platform and site reliability engineers should review:

psql command line tutorial and cheat sheet
Postgres features showcase (commented SQL samples)
postgres_dba set of useful tools for Postgres DBAs and all engineers
Set of Practices for common PG engineering suggestions
pgcenter CLI tool for observing and troubleshooting Postgres
PGXN client CLI tool to interact with the PostgreSQL Extension Network
Useful views and functions for postgreSQL DBA's
Postgres extension to introspect postgres log files from within the database
Postgres clone schema utility without need of going outside of database. Makes developers life easy by running single function to clone schema with all objects. It is very handy on Postgres RDS.
An unassuming proposal for PLSQL Continuous Integration using revision control, Jenkins and Maven.
Securing Your PostgreSQL Database
Advanced multi-threaded PostgreSQL connection pooler and request router
Postgres Container Apps: Easy Deploy Postgres Apps

Engineers writing stored routines (functions, SPs) should review:

Boost your User-Defined Functions in PostgreSQL describes some useful techniques for improving UDFs.
Building a recommendation engine inside Postgres with Python and Pandas
postgresql-plpython-webservice shows how to caching web service requests via PL/Python
Metagration, a PostgreSQL migration tool written in PostgreSQL
BedquiltDB is a JSON document-store built on PostgreSQL using PL/Python

Engineers writing applications should consider these PostgreSQL-native libraries:

makeExtendSchemaPlugin merges additional GraphQL types and resolvers into a Postgraphile PostgreSQL schema using a similar syntax to graphql-tools.
uuid-ossp for UUIDs as primary keys
Universally Unique Lexicographically Sortable Identifier (ULID)
ltree for representing labels of data stored in a hierarchical tree-like structure
pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching
simplified auditing based on SQL logging and FDWs to import logs instead of writing triggers
- Audit Trigger 91plus generic trigger function used for recording changes to tables into an audit log table
- pgMemento provides an audit trail for your data inside a PostgreSQL database using triggers and server-side functions written in PL/pgSQL
- Temporal Tables PostgreSQL Extension
pg_cron to run periodic jobs in PostgreSQL
shortkey for YouTube-like Short IDs as Postgres Primary Keys
dexter automatic indexer
message-db message and event store
RecDB Recommendation Engine
pg_similarity extension to support similarity queries on PostgreSQL
dox Document Database API extension when needing simple JSON store
colpivot.sql dynamic row to column pivotation/transpose
Guidance to implement NIST level 2 RBAC Hierarchical RBAC in PostgreSQL
ldap2pg to synchronize Postgres roles and privileges from YAML or LDAP
SeeQR database analytic tool to compare the efficiency of different schemas and queries on a granular level
postgres-basename-dirname contains functions which provide equivalents to the POSIX functions basename and dirname
postgresql-similarity extension package which provides functions that calculate similarity between two strings
pg_median_utils functions like median_filter which behaves the same as SciPy's medfilt
orafce ORACLE compatibility functions
postgres-typescript generates typescript functions from SQL files and lets you call these functions from your app
SPARQL compiler functions for PostgreSQL
PL/pgSQL implementation of hashids library, another alternative: plpg_hashids
Helpers for PL/pgSQL applications
session_variable Postgres database extension provides a way to create and maintain session scoped variables and constants, more or less like Oracle's global variables
Git Based Application Configuration, a Postgres extension that allows you to store and manipulate data in Git repositories stored in tables within the database. This is designed to be used for storage of configuration data for applications and services.
PostgreSQL extension with support for version string comparison

Engineers writing SQL-first code should use the following tools:

sqlcheck and plpgsql_check for linting SQL source code
pgTAP - Database testing framework for Postgres
pgcmp for comparing Postgres database schemas
Web-based Explain Visualizer (pev) and CLI query visualizer (gocmdpev) for performance optimization
JSON Schema validation for PostgreSQL when using JSON and JSONB columns
Use readable database errors as a guide for creating errors in the database which can be used in the front-end
postgresqltuner script to analyse PostgreSQL database configuration, and give tuning advice
Use HyperLogLog data structures and TopN PostgreSQL extension for higher performing value counting when data amounts get large
See GraphQL for Postgres which teaches techniques for how to parse GraphQL queries and transform them into SQL, all inside PostgreSQL (this is not production-level code but is good for education)
Plugin for prettier to support formatting of PostgreSQL-flavour SQL, including function bodies in SQL, pgSQL, PLV8, plpython, etc.

Engineers needing to instrument PostgreSQL:

Machine Learning without leaving PostgreSQL:

Apache MADlib
mindsdb.com for machine Learning without leaving the database

Content engineers who need datasets:

pgloader loads data from various sources into PostgreSQL
ISO-3166 - All countries and subcountries in the world

Precision Knowledge:

Lesser Known PostgreSQL Features
Evolutionary Database Design
Intuit’s Data Mesh Strategy
Web development platform built entirely in PostgreSQL
Postgres Analytics - Tips, best practices & extensions
OWASP API Security Project
Beyond REST is Netflix's approach to Rapid Development with GraphQL Microservices
6 Common Mistakes for SQL Queries that "Should be Working"
Postgres Notify for Real Time Dashboards
Postgres regex search over 10,000 GitHub repositories (using only a Macbook)
How to Modernize Your Data & Analytics Platform parts one, two, three, and four.

Stateless, non-data-centric services, are out of scope

If a custom micro service is completely stateless and does not have anything to do with reading or writing structured data, it should be written in TypeScript hosted on Deno or other micro service using Microsoft Dapr sidecar.

netspective-studios/PgDCP