PgDCP is Netspective's approach for using PostgreSQL to host tables, views, plus polyglot stored routines and surface them as GraphQL and REST using Postgraphile, Hasura, and PostgREST (or pREST). Using PostgreSQL whenever possible is called our Zero Middleware automated backend as a service (AutoBaaS) strategy. AutoBaaS helps us eliminate signficant amounts of GraphQL and REST boilerplate code plus reduces the number of microservices we need to deploy and manage.
Zero Middleware is first and foremost a technology strategy but also comes with a framework of code generators and reusable components to reducing data-centric code surfaces to just PostgreSQL with auto-generated, database-first secure GraphQL and REST endpoints. For use cases when higher-performance or more secure interfaces are necessary, direct access to tables, views, and stored routines using PostgreSQL wire protocol is encouraged.
We use psql
and pure-Postgres migrations as Database as Code (“DaC”) for DDL,
DQL, DML, etc. when possible. Instead of relying on 3rd party dependencies for
schema migrations, we use PostgreSQL-first stored routines themselves along with
psql
to manage idempotent migrations.
The overall objective for PgDCP is to reduce the number of tools that a
developer needs to know down to just PostgreSQL and psql
along with SQL and
PL/* hosted languages for all services.
The more we move data from one system to another (e.g. traditional ETL or it's cooler cousin ELTs), the more likely we are to run into scalability problems. Almost all modern databases are pretty good at querying and are able to do optimizations to improve querying either using vertical scaling or, in advanced cases, using horizontal scaling. However, no databases are good at moving data at scale. The reason we should fear the movement of data is that moving data from one system into another or vice-versa is almost always the most time-consuming part of data computing stacks.
Whenever dealing with data at scale, bring "compute to data" rather that "moving data to the compute clusters". PgDCP as an approach tries to use FDWs to leave data in their original locations when possible and only move the specific data when it cannot be processed in its original location. If it cannot eliminate data movement from source systems, PgDCP tries to reduce data movement to only a single data move into the PgDCP environment and then uses schemas, views, materialized views, etc. to transform data virtually instead of physically.
Even when applying Unified Star Schemas (USS), Dimensional Modeling, Dault Vault 2.0, or other techniques we try to use late-binding rather than early-binding when possible.
Our first choice of languages for writing data-centric micro services should be:
- Pure SQL views, materialized views, and stored routines using Postgraphile Schema Design.
- pgSQL views and materialized views and PL/pgSQL stored functions and stored procedures when pure SQL is not possible, using Postgraphile Schema Design.
In case SQL or PL/pgSQL is not appropriate:
- PL/Rust, PL/Java, PL/Go, PL/Deno or other type-safe PostgreSQL-hosted language should be prioritized.
- If type-safety is not possible or 3rd party libraries access is more important
than type-safety then PL/Python, PL/Perl, and other languages should be
considered.
- When using PL/Python or other language with package managers, consider using guidance such as programmatic access to PIP modules so that PostgreSQL server admins are not required for administering module dependencies
- The choice of language should depend on how easy the functionality can be expressed using Postgraphile Schema Design.
Cut Out the Middle Tier: Generating JSON Directly from Postgres is a good HN discussion about the benefits and pitfalls of the PgDCP approach.
PgDCP requires database-first security, which means PostgreSQL schemas, users, roles, permissions, and row-level security (RLS) should drive all data security requirements. Role-based access control (RBAC) and attribute based access control (ABAC) should be implemented in PostgreSQL stored routines. If necessary, ldap2pg can be used to synchronize roles with LDAP.
Because all our API functionality (except for serving the endpoints) will also be in the database we want to ensure that we secure views, stored procedures, and stored functions as if they were the API endpoints. OWASP API Security Project provides some great advice.
Zero Trust is a generally accepted cybersecurity approach that eliminates implicit
trust in favor of continuously validating each stage of digital interactions. PgDCP
encourages the same “never trust, always verify,” with Zero Trust SQL (ztSQL
).
ztSQL
is designed to protect database environments and enable faster development
by allowing anyone to run any SQL but leveages row-level security, attribute-based-
access-control, role-based access control, and data segmentation within the
database. Granular, “least access” policies should be implemented within the
database so that Zero Trust Data Access (ZTDA
) is possible.
If all access management is in the database, then securing access to the database is paramount. To that end, see:
- Securing Databases with Dynamic Credentials and HashiCorp Vault
- Using Vault to manage dynamic credentials of a Postgres
In PgDCP PostgreSQL is not treated as a "bit bucket" where you just store data for an application. It's the center and most important part of our services' universe and requires a deliberate, disciplined, database-first change management approach. While there are many database migrations tools like LiquiBase, Flyway, and Sqitch, they all assume that the problem should be solved in a database-independent manner outside of PostgreSQL. Since the PgDCP approach is to double-down on PostgreSQL and not worry about database portability, we want to perform PosgreSQL-first database change management.
First review Martin Fowler's Evolutionary Database Design article and then see tools like:
- Zero-downtime schema migrations in Postgres using views
- postgresql-dbpatch, which support conducting database changes and deploying them in a robust and automated way through the database instead of external tools
- Metagration, a PostgreSQL migration tool written in PostgreSQL
- Konfigraf, a Postgres extension that allows you to store and manipulate data in Git repositories stored in tables within the database. This is designed to be used for storage of configuration data for applications and services.
Observability of the database is important for forensics and quality assurance. Try to use simplified auditing based on SQL logging and FDWs to import logs instead of writing brittle custom triggers on each table/object. Separate observability into DDL changes, which can be alerted upon, as well as DML change logs which can be used for forensics. Wrap the observability of logs into a metrics table that can be scraped by Prometheus and used for alerting (e.g. whenever DDL changes occur, send alerts to anyone who needs to be informed).
Especially important is to integrate OpenTelemetry trace IDs into each DDL and DML statement so that end to end traceability becomes native to the database. Being able to track context and propogation of SQL through service layers will be critical to maintain high quality and reliability.
See semantic conventions for database client calls for how to provide traceability for database client calls and integrate W3C's Trace Context.
Modern applications demand type-safety (which is why PgDCP recommends TypeScript or Rust for applications). Since applications should be type-safe, we want our data models and database objects to also be type-safe. To enhance type-safety, create custom domains, custom enumerations or lookup tables based on inheritance, business types, and inheritable transaction tables ("table types"). Once you're using table inheritance you can use table inheritance wrapper functions.
All micro services code in PostgreSQL tables, views, functions and stored procedures will be surfaced through Postgraphile GraphQL first but our AutoBaaS requirements are that all services should be exposed through safe and secure REST interfaces via PostgREST (or pREST) as a fallback for non-GraphQL clients. We favor Postgraphile's GraphQL API because it generates code which honors PostgreSQL security, roles, and unique features more faithfully than other utilities such as Hasura.
Promote the use of Database Lab Engine (DLE) to "provision independent non-production environments with multi-terabyte PostgreSQL databases in a few seconds." Every developer should be able to have their own easy to create and teardown development instances for experimentation purposes and to make sure scripts are idempotent and repeatable. If database cannot be easily torn down and recreated in development and quality assurance environments, scripting is harder.
Part of the DLE is "Joe", which should be used by engineering and QA teams:
“Joe Bot”, a virtual DBA for SQL optimization, is a revolutionary new way to troubleshoot and optimize PostgreSQL query performance. Instead of running EXPLAIN or EXPLAIN (ANALYZE, BUFFERS) directly in production, users send queries for troubleshooting to Joe Bot. Joe Bot uses the Database Lab Engine (DLE) to:
- Generate a fresh thin clone
- Execute the query on the clone
- Return the resulting execution plan to the user
- The returned plan is identical to production in terms of structure and data volumes – this is achieved thanks to two factors: thin clones have the same data and statistics as production (at a specified point in time), and the PostgreSQL planner configuration on clones matches the production configuration.
All code in PostgreSQL should be tested, or assured, with pgTAP code. All Assurance Engineering Cases (AECs) should be written code-first, not human-first (what we call Assurance as Code). If Assurance is done within the database then tools like PostgreSQL Anonymizer should be used to help build test-first databases from production data when appropriate (see this PDF for further elaboration).
Because our responses to bugs in the database which might lead to database crashes is only as good as the number of times we see such crashes, we should use tools like pg_crash, a "Chaos Monkey" Extension for PostgreSQL databases. Per their repo "pg_crash is an extension to PostgreSQL, which allows you to periodically or randomly crash your database infrastructure by sending kill (or other) signals to your DB processes and make them fail. It is ideal for HA and failover testing."
Microsoft Excel should be the first UI that all data access should be designed for when accessing outside of developer-centric PgDCP use cases. If Excel can properly show your data, in a safe, secure, and performant way, then every other client can also do so. Excel-first UX should target "live ODBC" use cases where the database is directly accessed using PostgreSQL binary protocol.
Good security practice for modern apps that will allow record IDs to be shared
externally is to either have UUID or shortkey (see below) non-serial primary
keys. If you use a serial
type primary key, never send the PK out for external
consumption - always use surrogate keys via
uuid-ossp or similar. If
you use a serial PK and share the ID externally then it will be possible for
external users to "guess" IDs since the PKs would adjacent numerically.
Postgres Notify for Real Time Dashboards provides a great description of how to "push" updates to clients.
Platform and site reliability engineers should review:
- psql command line tutorial and cheat sheet
- Postgres features showcase (commented SQL samples)
- postgres_dba set of useful tools for Postgres DBAs and all engineers
- Set of Practices for common PG engineering suggestions
- pgcenter CLI tool for observing and troubleshooting Postgres
- PGXN client CLI tool to interact with the PostgreSQL Extension Network
- Useful views and functions for postgreSQL DBA's
- Postgres extension to introspect postgres log files from within the database
- Postgres clone schema utility without need of going outside of database. Makes developers life easy by running single function to clone schema with all objects. It is very handy on Postgres RDS.
- An unassuming proposal for PLSQL Continuous Integration using revision control, Jenkins and Maven.
- Securing Your PostgreSQL Database
- Advanced multi-threaded PostgreSQL connection pooler and request router
- Postgres Container Apps: Easy Deploy Postgres Apps
Engineers writing stored routines (functions, SPs) should review:
- Boost your User-Defined Functions in PostgreSQL describes some useful techniques for improving UDFs.
- Building a recommendation engine inside Postgres with Python and Pandas
- postgresql-plpython-webservice shows how to caching web service requests via PL/Python
- Metagration, a PostgreSQL migration tool written in PostgreSQL
- BedquiltDB is a JSON document-store built on PostgreSQL using PL/Python
Engineers writing applications should consider these PostgreSQL-native libraries:
- makeExtendSchemaPlugin merges additional GraphQL types and resolvers into a Postgraphile PostgreSQL schema using a similar syntax to graphql-tools.
- uuid-ossp for UUIDs as primary keys
- Universally Unique Lexicographically Sortable Identifier (ULID)
- ltree for representing labels of data stored in a hierarchical tree-like structure
- pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching
- simplified auditing based on SQL logging and FDWs to import
logs instead of writing triggers
- Audit Trigger 91plus generic trigger function used for recording changes to tables into an audit log table
- pgMemento provides an audit trail for your data inside a PostgreSQL database using triggers and server-side functions written in PL/pgSQL
- Temporal Tables PostgreSQL Extension
- pg_cron to run periodic jobs in PostgreSQL
- shortkey for YouTube-like Short IDs as Postgres Primary Keys
- dexter automatic indexer
- message-db message and event store
- RecDB Recommendation Engine
- pg_similarity extension to support similarity queries on PostgreSQL
- dox Document Database API extension when needing simple JSON store
- colpivot.sql dynamic row to column pivotation/transpose
- Guidance to implement NIST level 2 RBAC Hierarchical RBAC in PostgreSQL
- ldap2pg to synchronize Postgres roles and privileges from YAML or LDAP
- SeeQR database analytic tool to compare the efficiency of different schemas and queries on a granular level
- postgres-basename-dirname contains functions which provide equivalents to the POSIX functions basename and dirname
- postgresql-similarity extension package which provides functions that calculate similarity between two strings
- pg_median_utils functions like median_filter which behaves the same as SciPy's medfilt
- orafce ORACLE compatibility functions
- postgres-typescript generates typescript functions from SQL files and lets you call these functions from your app
- SPARQL compiler functions for PostgreSQL
- PL/pgSQL implementation of hashids library, another alternative: plpg_hashids
- Helpers for PL/pgSQL applications
- session_variable Postgres database extension provides a way to create and maintain session scoped variables and constants, more or less like Oracle's global variables
- Git Based Application Configuration, a Postgres extension that allows you to store and manipulate data in Git repositories stored in tables within the database. This is designed to be used for storage of configuration data for applications and services.
- PostgreSQL extension with support for version string comparison
Engineers writing SQL-first code should use the following tools:
- sqlcheck and plpgsql_check for linting SQL source code
- pgTAP - Database testing framework for Postgres
- pgcmp for comparing Postgres database schemas
- Web-based Explain Visualizer (pev) and CLI query visualizer (gocmdpev) for performance optimization
- JSON Schema validation for PostgreSQL when using JSON and JSONB columns
- Use readable database errors as a guide for creating errors in the database which can be used in the front-end
- postgresqltuner script to analyse PostgreSQL database configuration, and give tuning advice
- Use HyperLogLog data structures and TopN PostgreSQL extension for higher performing value counting when data amounts get large
- See GraphQL for Postgres which teaches techniques for how to parse GraphQL queries and transform them into SQL, all inside PostgreSQL (this is not production-level code but is good for education)
- Plugin for prettier to support formatting of PostgreSQL-flavour SQL, including function bodies in SQL, pgSQL, PLV8, plpython, etc.
Engineers needing to instrument PostgreSQL:
Machine Learning without leaving PostgreSQL:
- Apache MADlib
- mindsdb.com for machine Learning without leaving the database
Content engineers who need datasets:
- pgloader loads data from various sources into PostgreSQL
- ISO-3166 - All countries and subcountries in the world
Precision Knowledge:
- Lesser Known PostgreSQL Features
- Evolutionary Database Design
- Intuit’s Data Mesh Strategy
- Web development platform built entirely in PostgreSQL
- Postgres Analytics - Tips, best practices & extensions
- OWASP API Security Project
- Beyond REST is Netflix's approach to Rapid Development with GraphQL Microservices
- 6 Common Mistakes for SQL Queries that "Should be Working"
- Postgres Notify for Real Time Dashboards
- Postgres regex search over 10,000 GitHub repositories (using only a Macbook)
- How to Modernize Your Data & Analytics Platform parts one, two, three, and four.
If a custom micro service is completely stateless and does not have anything to do with reading or writing structured data, it should be written in TypeScript hosted on Deno or other micro service using Microsoft Dapr sidecar.