Opinionated SQL-powered productive roll-forward migration tool for PostgreSQL.
To help us develop this software sustainably under the MIT license, we ask all individuals and businesses that use it to help support its ongoing maintenance and development via sponsorship.
And please give some love to our featured sponsors 🤩:
Chad Furman |
Timescale |
- fast iteration speed — save a file and database is updated in milliseconds
- roll-forward only — maintaining rollbacks is a chore, and in 10 years of API development I've never ran one in production
- familiar — no custom DSL to learn, just use PostgreSQL syntax
- fully functional — sending SQL commands directly to PostgreSQL means you can use all of PostgreSQL's features
- complements PostGraphile — works with any application, but PostGraphile's watch mode means that the GraphQL schema is instantly regenerated (without server restart) whenever the database changes
HIGHLY EXPERIMENTAL
If you're a sponsor and you're using this software, let me know so I can justify allocating additional time to it.
The interface is raw and doesn't ask for confirmation (e.g. the
graphile-migrate reset
command will drop and re-create that database without
asking for confirmation).
There are no automated tests yet, and APIs may still change. (Pull requests to add tests welcome!)
Using this for prototyping should be fine, but when it comes to shipping you may want to
- help us write tests and finalise interfaces
- send us money to do the same
- use an alternative migration framework, such as:
- Local iteration should be easy and fast
- Migrating should be fast
- Once deployed, databases should be identical (including subtleties such as column order)
- Migration software should not be tied to a particular application stack
- Migrations should be written in SQL
- Roll-forward only (production issues should be fixed via additional migrations, development can iterate current migration)
- Once a migration is signed off (deployable) it should never be edited
- Use PostgreSQL ;)
- Development databases are cheap; can run multiple
- Resetting development database is acceptable if absolutely necessary
- Production databases are critical - NEVER RESET
- Migrating data (as well as DDL) is acceptable, but should be kept to fast operations (or trigger a background job)
- Migrations should automatically be wrapped in transactions by default
- Migrations that require execution outside of a transaction (e.g. to enable augmenting non-DDL-safe things, such as
ENUM
s in PostgreSQL) should be explicitly marked - Migrations should not pollute PostgreSQL global settings (e.g. use
SET LOCAL
rather thanSET
) - Roles should be managed outside of migrations (since they can be shared between databases)
- Certain schemas are managed by other tools and should not be interfered with; e.g.
graphile_jobs
graphile-migrations
requires two databases: the first is your main database
against which you perform development, the second is a "shadow" database
which is used by the system to apply migrations. You should never interact
with the "shadow" database directly. Further all members of your team should
run the same PostgreSQL version to ensure that the shadow dump matches for
everyone (one way of achieving this is through Docker, but that isn't
required).
Runs any un-executed committed migrations. Does not run current.sql
. For use in production and development.
If --shadow
is specified, migrates the shadow database instead.
If --force
is specified, it will run any afterAllMigrations
actions even if no migrations are actually ran.
Runs any un-executed committed migrations and then runs and watches
current.sql
, re-running its contents on any change.
current.sql
should be idempotent (this is your responsibility, see
"Idempotency" below); i.e. it should be able to be ran multiple times and have
the same result.
If --shadow
is specified, changes will be applied against the shadow database instead.
If --once
is specified, current.sql
will be ran once and then the command will exit.
- reset the shadow database to the latest dump
- apply the current migration to the shadow database, and replace the dump
- move the current migration to committed migrations (adding a hash to prevent tampering)
Drop and re-create the database, and re-run all the committed migrations from the start. HIGHLY DESTRUCTIVE
If --shadow
is specified, the shadow database will be reset rather than the main database.
[EXPERIMENTAL!]
Exits with a bitmap status code indicating statuses:
- 1 if there are committed migrations that have not been executed yet
- 2 if the
current.sql
file is non-empty (ignoring comments)
If both of the above are true then the output status will be 3 (1+2). If neither are true, exit status will be 0 (success).
Also outputs helpful messages:
There are 3 committed migrations pending:
000001.sql
000002.sql
000003.sql
The current.sql migration is not empty and has not been committed.
It's possible to consume this module as a JavaScript library rather than via
the CLI. There's no documentation on this, but the CLI code in cli.ts
is very
approachable.
ALPHA WARNING: internals are likely to change a lot, so expect breakage if you use library mode right now. CLI is more stable.
Configuration goes in .gmrc
, which is a JSON file with the following keys:
connectionString
(orDATABASE_URL
envvar) — this is your main development database. If you rungraphile-migrate reset
this will be dropped without warning, so be careful.shadowConnectionString
(orSHADOW_DATABASE_URL
envvar) — the shadow database which will be dropped frequently, so don't store anything to it that you care about. This database should not already exist.rootConnectionString
(orROOT_DATABASE_URL
envvar) — this is used to connect to the database server with superuser privileges to drop and re-create the relevant databases (via thereset
command directly, or via thecommit
command for the shadow database). It must not be a connection to the database inconnectionString
orshadowConnectionString
. It defaults to "template1" if the key or environment variable is not set so it may result in PG connection errors if a default PGtemplate1
database is not available.pgSettings
— optional string-string key-value object defining settings to set in PostgreSQL when migrating. Useful for settingsearch_path
for example. Beware of changing this, a full reset will use the new values which may lead to unexpected consequences.placeholders
— optional string-string key-value object defining placeholder values to be replaced when encountered in any migration files. Placeholders must begin with a colon and a capital letter, and then can continue with a string of capital letters, numbers and underscores/^:[A-Z][A-Z0-9_]+$/
.:DATABASE_NAME
and:DATABASE_OWNER
are automatically added to this object. The value must be a valid in the place you use it (i.e. ensure you escape the values) — graphile-migrate does not perform any escaping for you. The special value!ENV
will tell graphile-migrate to load the setting from the environment variable with the same name.afterReset
— optional list of actions to execute after the database has been created but before the migrations run, useful to set default permissions, install extensions or install external schemas likegraphile-worker
that your migrations may depend on. See "Actions" below.afterAllMigrations
— optional list of actions to execute after all the migrations have ran, useful for performing a tasks like dumping the database or regenerating dependent data (GraphQL schema, type definitions, etc). See "Actions" below.
What follows is an example configuration file that depends on the following environmental variables being set:
ROOT_DATABASE_URL
- equivalent torootConnectionString
above, e.g.postgres://localhost/template1
DATABASE_URL
- equivalent toconnectionString
above, e.g.postgres://my_user:my_password@localhost/my_db
SHADOW_DATABASE_URL
- equivalent toshadowConnectionString
above, e.g.postgres://my_user:my_password@localhost/my_db_shadow
(should use same credentials as the )
{
"pgSettings": {
"search_path": "app_public,app_private,app_hidden,public"
},
"placeholders": {
":DATABASE_AUTHENTICATOR": "!ENV",
":DATABASE_VISITOR": "!ENV"
},
"afterReset": [
"afterReset.sql",
{ "_": "command", "command": "npx --no-install graphile-worker --once" }
],
"afterAllMigrations": [
{
"_": "command",
"command": "pg_dump --schema-only --no-owner --exclude-schema=graphile_migrate --file=data/schema.sql \"$GM_DBURL\""
}
]
}
We support certain "actions" after certain events happen; for example see
afterReset
and afterAllMigrations
mentioned above. Actions should be
specified as a list of strings or action spec objects.
String values are converted to sql
action specs (see below) with the file
property set to the string. I.e. they indicate a file within the migrations
folder to execute against the database.
Action spec objects are plain JSON objects with the following properties:
_
- specifies the type of object (see supported types below)shadow
(optional) - if set, must be a boolean;true
indicates the action should only occur against the shadow DB,false
indicates that the action should not occur against the shadow DB, unset runs against both databases
Each action spec subtype can have its own properties
e.g.
{
"_": "sql",
"file": "install_extensions.sql"
}
The file
indicates the name of a SQL file in the migrations/
folder to
execute against the database (e.g. to set permissions, load data, install
extensions, etc).
e.g.
{
"_": "command",
"command": "npx --no-install graphile-worker --once"
}
command
actions specify shell actions (e.g. running an external
command such as graphile-worker
which might install a separately managed
worker schema into the database, or running something like pg_dump
to dump
the schema).
When the command is invoked it will have access to the following envvars:
GM_DBURL
- the relevant database URL (e.g. the one that was just reset/migrated)GM_DBNAME
- the database name inGM_DBURL
; you might use this if you need to use separate superuser credentials to install extensions against the databaseGM_DBUSER
- the database user inGM_DBURL
GM_SHADOW
- set to1
if we're dealing with the shadow DB, unset otherwise
The intention is that developers can work on different migrations in parallel,
and can switch between git
branches - idempotent migrations using CASCADE
when dropping should make it possible to do this with little issue (other than
the implicit data loss of dropping tables/columns/etc).
graphile-migrate commit
, on the other hand, should be linear - one way to
approach this is to only commit a migration immediately before it is merged to
master
. Another approach is to do the commit on master
itself. Non-linear
migration commits will result in errors, and may lead to you resetting your
development database.
graphile-migrate
is all about iteration; you write your database modification
commands in migrations/current.sql
and every time you save it is ran against
the database, generally taking under 100ms.
Because we run the same script over and over (on every save) and there's no down migrations, you need to make your script idempotent. PostgreSQL has a number of idempotent commands such as:
create or replace function...
drop table if exists ...
drop trigger if exists ...
-- etc
When these aren't suitable you can start your migration with an explicit rollback: commands that undo later actions. For example:
-- undo
drop table if exists people;
-- redo
create table people (
id serial primary key,
name text
);
When it comes time to commit your migration we will run it against a "shadow" database to make sure it's valid.
It's often wise to use DROP ... CASCADE
so that if other migrations are
worked on in parallel no additional rollback
step is required. When you
DROP ... CASCADE
, be sure to add back any dropped dependents (triggers,
indexes, etc) once the dropped entity has been replaced. Reviewing the database
schema diff can help you spot these issues.
More examples of idempotent operations:
-- Create a schema
DROP SCHEMA IF EXISTS app CASCADE;
CREATE SCHEMA app;
-- Create a table
DROP TABLE IF EXISTS foo CASCADE;
CREATE TABLE foo ...;
-- Add a column to the end of the table
ALTER TABLE foo DROP COLUMN IF EXISTS bar CASCADE;
ALTER TABLE foo ADD COLUMN foo ...;
-- Make a column NOT NULL
ALTER TABLE foo ALTER COLUMN foo SET NOT NULL;
-- Alter a column type
ALTER TABLE foo ALTER COLUMN foo TYPE int USING foo::int;
-- Change the body or flags of a function
CREATE OR REPLACE FUNCTION ...;
-- Change a function signature (arguments, return type, etc)
DROP FUNCTION IF EXISTS ... CASCADE;
CREATE OR REPLACE FUNCTION ...
Some migrations require execution outside of a transaction (e.g. to enable augmenting non-DDL-safe things, such as ENUMs in PostgreSQL). To disable wrapping a given migration file in a transaction, use the special comment --! no-transaction
at the top of the migration file, e.g.
--! no-transaction
ALTER TYPE user_role ADD VALUE IF NOT EXISTS 'Admin';
-
Use a proper CLI parsing library
-
Store pgSettings with committed transactions to protect against user edits
-
Add automated tests
-
Add
graphile-migrate check
command: reset the shadow database to the latest dump, apply the current migration to the shadow database, and output a SQL schema diff you can use to ensure no accidental changes have been made -
Add
graphile-migrate init
command: ask questions and set up the relevant files for running graphile-migrate. -
Add
graphile-migrate import
command: used after init but before running any other commands, imports the existing database as if it were the first migration. (For now just pg_dump, and put the schema in migrations/schema.sql.)