IntegreSQL
manages isolated PostgreSQL databases for your integration tests.
Do your engineers a favour by allowing them to write fast executing, parallel and deterministic integration tests utilizing real PostgreSQL test databases. Resemble your live environment in tests as close as possible.
- IntegreSQL
- Background
- Approach 0: Leaking database mutations for subsequent tests
- Approach 1: Isolating by resetting
- Approach 2a: Isolation by transactions
- Approach 2b: Isolation by mocking
- Approach 3a: Isolation by templates
- Approach 3b: Isolation by cached templates
- Approach 3c: Isolation by cached templates and pool
- Final approach: IntegreSQL
- Install
- Configuration
- Usage
- Contributing
- Maintainers
- License
- Background
We came a long way to realize that something just did not feel right with our PostgreSQL integration testing strategies. This is a loose summary of how this project came to life.
Testing our customer backends actually started quite simple:
- Test runner starts
- Recreate a PostgreSQL test database
- Apply all migrations
- Seed all fixtures
- Utilizing the same PostgreSQL test database for each test:
- Run your test code
- Test runner ends
It's quite easy to spot the problem with this approach. Data may be mutated by any single test and is visible from all subsequent tests. It becomes cumbersome to make changes in your test code if you can't rely on a clean state in each and every test.
Let's try to fix that like this:
- Test runner starts
- Recreate a PostgreSQL test database
- Before each test:
- Truncate
- Apply all migrations
- Seed all fixtures
- Utilizing the same PostgreSQL test database for each test:
- Run your test code
- Test runner ends
Well, it's now isolated - but testing time has increased by a rather high factor and is totally dependent on your truncate/migrate/seed operations.
What about using database transactions?
- Test runner starts
- Recreate a PostgreSQL test database
- Apply all migrations
- Seed all fixtures
- Before each test:
- Start a new database transaction
- Utilizing the same PostgreSQL test database for each test:
- Run your test code
- After each test:
- Rollback the database transaction
- Test runner ends
After spending various time to rewrite all code to actually use the injected database transaction in each code, you realize that nested transactions are not supported and can only be poorly emulated using save points. All database transaction specific business code, especially their potential error state, is not properly testable this way. You therefore ditch this approach.
What about using database mocks?
- Test runner starts
- Utilizing an in-memory mock database isolated for each test:
- Run your test code
- Test runner ends
I'm generally not a fan of emulating database behavior through a mocking layer while testing/implementing. Even minor version changes of PostgreSQL plus it's extensions (e.g. PostGIS) may introduce slight differences, e.g. how indices are used, function deprecations, query planner, etc. . It might not even be an erroneous result, just performance regressions or slight sorting differences in the returned query result.
We try to approximate local/test and live as close as possible, therefore using the same database, with the same extensions in their exact same version is a hard requirement for us while implementing/testing locally.
We discovered that using PostgreSQL templates and creating the actual new test database from them is quite fast, let's to this:
- Test runner starts
- Recreate a PostgreSQL template database
- Apply all migrations
- Seed all fixtures
- Before each test:
- Create a new PostgreSQL test database from our already migrated/seeded template database
- Utilizing a new isolated PostgreSQL test database for each test:
- Run your test code
- Test runner ends
Well, we are up in speed again, but we still can do better, how about...
- Test runner starts
- Check migrations/fixtures have changed (hash over all related files)
- Yes
- Recreate a PostgreSQL template database
- Apply all migrations
- Seed all fixtures
- No, nothing has changed
- Simply reuse the previous PostgreSQL template database
- Yes
- Before each test:
- Create a new PostgreSQL test database from our already migrated/seeded template database
- Utilizing a new isolated PostgreSQL test database for each test:
- Run your test code
- Test runner ends
This gives a significant speed bump as we no longer need to recreate our template database if no files related to the database structure or fixtures have changed. However, we still need to create a new PostgreSQL test database from a template before running any test. Even though this is quite fast, could we do better?
- Test runner starts
- Check migrations/fixtures have changed (hash over all related files)
- Yes
- Recreate a PostgreSQL template database
- Apply all migrations
- Seed all fixtures
- No, nothing has changed
- Simply reuse the previous PostgreSQL template database
- Yes
- Create a pool of n PostgreSQL test databases from our already migrated/seeded template database
- Before each test:
- Select the first new PostgreSQL test database that is ready from the test pool
- Utilizing your selected PostgreSQL test database from the test pool for each test:
- Run your test code
- After each test:
- If there are still tests lefts to run add some additional PostgreSQL test databases from our already migrated/seeded template database
- Test runner ends
Finally, by keeping a warm pool of test database we arrive at the speed of Approach 0, while having the isolation gurantees of all subsequent approaches. This is actually the (simplified) strategy, that we have used in allaboutapps-backend-stack for many years.
Here's a quick benchmark of how this strategy typically performed back then:
--- ----------------<storageHelper strategy report>---------------- ---
replicas switched: 50 avg=11ms min=1ms max=445ms
replicas awaited: 1 prebuffer=8 avg=436ms max=436ms
background replicas: 58 avg=272ms min=41ms max=474ms
- warm up template (cold): 82% 2675ms
* truncate: 62% 2032ms
* migrate: 18% 594ms
* seed: 1% 45ms
- switching: 17% 571ms
* disconnect: 1% 42ms
* switch replica: 14% 470ms
- resolve next: 1% 34ms
- await next: 13% 436ms
* reinitialize: 1% 57ms
strategy related time: --- 3246ms
vs total executed time: 20% 15538ms
--- ---------------</ storageHelper strategy report>--------------- ---
This is a rather small testsuite with 50
tests and with a tiny database. Thus the whole test run was finished in ~15sec
. ~2.7sec
were spend setting up the template within the warm up (truncate + migrate + seed) and ~0.6sec
in total waiting for a new test/replica databases to become available for a test. We spend ~20%
of our total execution time running / waiting inside our test strategy approach.
This a cold start. You pay for this warm-up flow only if no template database was cached by a previous test run (if your migrations + fixtures files - the hash
over these files - hasn't changed).
A new test database (called a replica here) from this tiny template database took max. ~500ms
to create, on avg. this was ~halfed and most importantly can be done in the background (while some tests already execute).
The cool thing about having a warm pool of replicas setup in the background, is that selecting new replicas from the pool is blazingly fast, as typically they will be already ready when it's time to execute the next test. For instance, it took ~500ms
max. and 11ms
on avg. to select a new replica for all subsequent tests (we only had to wait once until a replica became available for usage within a test - typically it's the first test to be executed).
Let's look at a sightly bigger testsuite and see how this approach may possibly scale:
--- -----------------<storageHelper strategy report>------------------ ---
replicas switched: 280 avg=26ms min=11ms max=447ms
replicas awaited: 1 prebuffer=8 avg=417ms max=417ms
background replicas: 288 avg=423ms min=105ms max=2574ms
- warm up template (cold): 40% 5151ms
* truncate: 8% 980ms
* migrate: 26% 3360ms
* seed: 4% 809ms
- switching: 60% 7461ms
* disconnect: 2% 322ms
* switch replica: 6% 775ms
- resolve next: 2% 358ms
- await next: 3% 417ms
* reinitialize: 50% 6364ms
strategy related time: --- 12612ms
vs total executed time: 11% 111094ms
--- ----------------</ storageHelper strategy report>----------------- ---
This test suite is larger and comes with 280
tests, the whole test run finished in ~1m50s
(~390ms
per test on avg.). ~5.2sec
were spend setting up the template and ~7.5sec
in total waiting for a new test / replica databases to become available for a test.
The rise in switching time is expected, as we need way more replicas / test databases this time, however we only spend ~11%
running / waiting inside our test strategy approach. To put that into perspective, each test only had to wait ~26ms
on avg. until it could finally execute (and typically, this is solely the time it needs to open up a new database connection).
This should hopefully give you some base understanding on why we consider this testing approach essential for our projects. It's the sweet combination of speed and isolation.
We realized that having the above pool logic directly within the test runner is actually counterproductive and is further limiting usage from properly utilizing parallel testing (+performance).
As we switched to Go as our primary backend engineering language, we needed to rewrite the above logic anyways and decided to provide a safe and language agnostic way to utilize this testing strategy with PostgreSQL.
IntegreSQL is a RESTful JSON API distributed as Docker image or go cli. It's language agnostic and manages multiple PostgreSQL templates and their separate pool of test databases for your tests. It keeps the pool of test databases warm (as it's running in the background) and is fit for parallel test execution with multiple test runners / processes.
Our flow now finally changed to this:
- Start IntegreSQL and leave it running in the background (your PostgreSQL template and test database pool will always be warm)
- ...
- 1..n test runners start in parallel
- Once per test runner process
- Get migrations/fixtures files
hash
over all related database files InitializeTemplate: POST /templates
: attempt to create a new PostgreSQL template database identifying though the above hashpayload: {"hash": "string"}
StatusOK: 200
- Truncate
- Apply all migrations
- Seed all fixtures
FinalizeTemplate: PUT /templates/{hash}
- If you encountered any template setup errors call
DiscardTemplate: DELETE /templates/{hash}
StatusLocked: 423
- Some other process has already recreated a PostgreSQL template database for this
hash
(or is currently doing it), you can just consider the template ready at this point.
- Some other process has already recreated a PostgreSQL template database for this
StatusServiceUnavailable: 503
- Typically happens if IntegreSQL cannot communicate with PostgreSQL, fail the test runner process
- Get migrations/fixtures files
- Before each test
GetTestDatabase: GET /templates/{hash}/tests
- Blocks until the template database is finalized (via
FinalizeTemplate
) StatusOK: 200
- You get a fully isolated PostgreSQL database from our already migrated/seeded template database to use within your test
StatusNotFound: 404
- Well, seems like someone forgot to call
InitializeTemplate
or it errored out.
- Well, seems like someone forgot to call
StatusGone: 410
- There was an error during test setup with our fixtures, someone called
DiscardTemplate
, thus this template cannot be used.
- There was an error during test setup with our fixtures, someone called
StatusServiceUnavailable: 503
- Well, typically a PostgreSQL connectivity problem
- Blocks until the template database is finalized (via
- Utilizing the isolated PostgreSQL test database received from IntegreSQL for each (parallel) test:
- Run your test code
- After each test optional:
ReturnTestDatabase: DELETE /templates/{hash}/tests/{test-database-id}
- Marks the test database that it can be wiped early on pool limit overflow (or reused if
true
is submitted)
- Marks the test database that it can be wiped early on pool limit overflow (or reused if
- 1..n test runners end
- ...
- Subsequent 1..n test runners start/end in parallel and reuse the above logic
The flow above might look intimidating at first glance, but trust us, it's simple to integrate especially if there is already an client library available for your specific language. We currently have those:
- Go: integresql-client-go by Nick Müller - @MorpheusXAUT
- Python: integresql-client-python by Marcin Sztolcman - @msztolcman
- ... Add your link here and make a PR
A really good starting point to write your own integresql-client for a specific language can be found here (go code) and here (godoc). It's just RESTful JSON after all.
If you want to take a look on how we integrate IntegreSQL - 🤭 - please just try our go-starter project or take a look at our testing setup code.
A minimal Docker image containing a pre-built IntegreSQL
executable is available at Docker Hub.
docker pull allaboutapps/integresql
Installing IntegreSQL
locally requires a working Go (1.14 or above) environment. Install the IntegreSQL
executable to your Go bin folder:
go get github.com/allaboutapps/integresql/cmd/server
IntegreSQL
requires little configuration, all of which has to be provided via environment variables (due to the intended usage in a Docker environment). The following settings are available:
Description | Environment variable | Default | Required |
---|---|---|---|
IntegreSQL: listen address (defaults to all if empty) | INTEGRESQL_ADDRESS |
"" |
|
IntegreSQL: port | INTEGRESQL_PORT |
5000 |
|
PostgreSQL: host | INTEGRESQL_PGHOST , PGHOST |
"127.0.0.1" |
Yes |
PostgreSQL: port | INTEGRESQL_PGPORT , PGPORT |
5432 |
|
PostgreSQL: username | INTEGRESQL_PGUSER , PGUSER , USER |
"postgres" |
Yes |
PostgreSQL: password | INTEGRESQL_PGPASSWORD , PGPASSWORD |
"" |
Yes |
PostgreSQL: database for manager | INTEGRESQL_PGDATABASE |
"postgres" |
|
PostgreSQL: template database to use | INTEGRESQL_ROOT_TEMPLATE |
"template0" |
|
Managed databases: prefix | INTEGRESQL_DB_PREFIX |
"integresql" |
|
Managed template databases: prefix integresql_template_<HASH> |
INTEGRESQL_TEMPLATE_DB_PREFIX |
"template" |
|
Managed test databases: prefix integresql_test_<HASH>_<ID> |
INTEGRESQL_TEST_DB_PREFIX |
"test" |
|
Managed test databases: username | INTEGRESQL_TEST_PGUSER |
PostgreSQL: username | |
Managed test databases: password | INTEGRESQL_TEST_PGPASSWORD |
PostgreSQL: password | |
Managed test databases: minimal test pool size | INTEGRESQL_TEST_INITIAL_POOL_SIZE |
10 |
|
Managed test databases: maximal test pool size | INTEGRESQL_TEST_MAX_POOL_SIZE |
500 |
Simply start the IntegreSQL
Docker (19.03 or above) container, provide the required environment variables and expose the server port:
docker run -d --name integresql -e INTEGRESQL_PORT=5000 -p 5000:5000 allaboutapps/integresql
IntegreSQL
can also be included in your project via Docker Compose (1.25 or above):
version: "3.4"
services:
# Your main service image
service:
depends_on:
- postgres
- integresql
environment:
PGDATABASE: &PGDATABASE "development"
PGUSER: &PGUSER "dbuser"
PGPASSWORD: &PGPASSWORD "9bed16f749d74a3c8bfbced18a7647f5"
PGHOST: &PGHOST "postgres"
PGPORT: &PGPORT "5432"
PGSSLMODE: &PGSSLMODE "disable"
# optional: env for integresql client testing
# see https://github.com/allaboutapps/integresql-client-go
# INTEGRESQL_CLIENT_BASE_URL: "http://integresql:5000/api"
# [...] additional main service setup
integresql:
image: allaboutapps/integresql:1.0.0
ports:
- "5000:5000"
depends_on:
- postgres
environment:
PGHOST: *PGHOST
PGUSER: *PGUSER
PGPASSWORD: *PGPASSWORD
postgres:
image: postgres:12.2-alpine # should be the same version as used live
# ATTENTION
# fsync=off, synchronous_commit=off and full_page_writes=off
# gives us a major speed up during local development and testing (~30%),
# however you should NEVER use these settings in PRODUCTION unless
# you want to have CORRUPTED data.
# DO NOT COPY/PASTE THIS BLINDLY.
# YOU HAVE BEEN WARNED.
# Apply some performance improvements to pg as these guarantees are not needed while running locally
command: "postgres -c 'shared_buffers=128MB' -c 'fsync=off' -c 'synchronous_commit=off' -c 'full_page_writes=off' -c 'max_connections=100' -c 'client_min_messages=warning'"
expose:
- "5432"
ports:
- "5432:5432"
environment:
POSTGRES_DB: *PGDATABASE
POSTGRES_USER: *PGUSER
POSTGRES_PASSWORD: *PGPASSWORD
volumes:
- pgvolume:/var/lib/postgresql/data
volumes:
pgvolume: # declare a named volume to persist DB data
You may also refer to our go-starter docker-compose.yml
.
Running the IntegreSQL
server locally requires configuration via exported environment variables (see below):
export INTEGRESQL_PORT=5000
export PGHOST=127.0.0.1
export PGUSER=test
export PGPASSWORD=testpass
integresql
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
IntegreSQL
requires the following local setup for development:
- Docker CE (19.03 or above)
- Docker Compose (1.25 or above)
The project makes use of the devcontainer functionality provided by Visual Studio Code so no local installation of a Go compiler is required when using VSCode as an IDE.
Should you prefer to develop IntegreSQL
without the Docker setup, please ensure a working Go (1.14 or above) environment has been configured as well as a PostgreSQL instance is available (tested against version 12 or above, but should be compatible to lower versions) and the appropriate environment variables have been configured as described in the Install section.
- Start the local docker-compose setup and open an interactive shell in the development container:
# Build the development Docker container, start it and open a shell
./docker-helper.sh --up
- Initialize the project, downloading all dependencies and tools required (executed within the dev container):
# Init dependencies/tools
make init
# Build executable (generate, format, build, vet)
make
- Execute project tests and start server:
# Execute tests
make test
# Run IntegreSQL server with config from environment
integresql
MIT © 2020 aaa – all about apps GmbH | Nick Müller | Mario Ranftl and the IntegreSQL
project contributors