#timescaledb-backup
timescaledb-backup
is a program for making dumping and restoring a
TimescaleDB database simpler, less error-prone,
and more performant. In particular, the current use of vanilla PostgreSQL tools
pg_dump
and
pg_restore
has several
limitations when applied to TimescaleDB:
- The PostgreSQL backup/restore tools do not support backup/restore across versions of extensions. So that if you take a backup from (say) TimescaleDB v1.7.1, you need to restore to a database version that is also running TimescaleDB v1.7.1, and then manually upgrade TimescaleDB to a later version.
- The backup/restore tools do not track which version of TimescaleDB is in the backup, so a developer needs to maintain additional external information to ensure the proper restore process.
- Users need to take manual steps to run pre- and post-restore hooks (database functions) in TimescaleDB to ensure correct behavior. Failure to execute these hooks can prevent restores from functioning correctly.
- The restore process cannot easily perform parallel restoration for greater speed/efficiency.
Towards this end, timescaledb-backup
overcomes many of these current
limitations. It continues to use pg_dump
and pg_restore
, but properly wraps
them to:
- Track automatically the version of TimescaleDB internally in the information dumped, to ensure that the proper version is always restored.
- Run all pre and post restore hooks at their proper times during a restore; and
- Enable parallel restore by properly sequencing catalog and data restores during the restore process.
You can install by running go get
$ go get github.com/timescale/timescaledb-backup/
Or by downloading a binary at our release page
It will also be distributed with a number of our tools in timescaledb-tools
; yum, apt, Homebrew,etc.
- You will need binaries for
pg_dump
,pg_dumpall
, andpg_restore
installed where you are runningtimescaledb-backup
- The target database needs the
.so
file of the dumped version so that we can restore to the correct version. It will also need the.so
of your target version.
First create a dump using the ts-dump
command, for those used to using pg_dump
, the
options are pared down significantly you will need to provide the following parameters:
--db-URI
the database connection string in Postgres URI format to connect to. The Postgres format is:postgresql://[user[:password]@][host][:port][,...][/dbname][?param1=value1&...]
many of these parameters can be specified in enviornmental variables in the normal Postgres convention and passwords will be looked up in the usual ways as allowed by thepgx
go library.--dump-dir
the dump directory where you would like the dump to be stored. The dump directory should not exist, it will be created as part of creating the dump, however the path to the directory should exist.
Optional parameters:
--jobs
Sets the number of jobs to run for the dump, by default it is set to 4 and will run in parallel mode, set to 0 to disable parallelism--verbose
Determines whether verbose output will be provided frompg_dump
. Defaults to false.--dump-roles
Determines whether to usepg_dumpall
to dump roles (without password information) before running the dump. Can be useful in order to restore permissions on tables etc. Defaults to true.--dump-tablespaces
Determines whether to usepg_dumpall
to dump tablespaces before running the dump. Can be useful if using multiple tablespaces and in restoring tables to the correct tablespaces. Defaults to true.--dump-pause-jobs
Determines whether to pause background jobs that could disrupt a parallel dump process by performing DDL during the dump. Defaults to true, only affects parallel dumps.--dump-pause-UDAs
Determines whether to pause user defined actions (available in Timescale 2.0+) when pausing jobs. Defaults to true, only affects parallel dumps where jobs are being paused. ---dump-job-finish-timeout
The number of seconds to wait for jobs which may perform DDL to finish before timing out. Defaults to 600 (10 minutes), set to -1 to not wait on jobs to finish. This only affects parallel dumps where jobs are being paused.
When a dump is created, the dump directory will be created, as well as a subdirectory for the dump. The main directory contains a JSON with TimescaleDB version information as well as any sql
files generated by pg_dumpall
.
Once you have a backup you can run a ts-restore
by specifying the same dump directory
and a new database uri. The database you are restoring to must already exist, so be sure
to create it before running the restore. By default, roles.sql
and tablespaces.sql
files are created in the dump directory. These may be run before the restore by running
psql -d <db-URI> -f <dump-dir>/roles.sql
& psql -d <db-URI> -f <dump-dir>/tablespaces.sql
.
The <db-URI>
parameter is specified in the same format as below. If you are restoring
multiple databases on the same postgres
instance, they only need to be run once and will
error if run multiple times, however errors resulting from roles or tablespaces being
created when they already exist can be safely disregarded.
If TimescaleDB is installed it will be dropped and re-created at the proper version, we recommend restoring only to an empty database.
You will need to provide the following parameters:
--db-URI
the database connection string in Postgres URI format to connect to. The Postgres format is:postgresql://[user[:password]@][host][:port][,...][/dbname][?param1=value1&...]
many of these parameters can be specified in enviornmental variables in the normal Postgres convention and passwords will be looked up in the usual ways as allowed by thepgx
go library.--dump-dir
the dump directory where the output fromts-dump
was stored.
Optional parameters:
--jobs
Sets the number of jobs to run for the restore, by default it is set to 4 and will run in parallel mode during the sections1 that are able to be parallelized. Set to 0 to disable parallelism.--verbose
Provide verbose output frompg_restore
. Defaults to true.--do-update
Update the TimescaleDB version to the latest default version immediately following the restore.2 Defaults to true.
As an example, let's suppose I have two postgres
clusters running on my machine, perhaps on versions 11 on port 5432 and 12 on 5433 and I wish to dump and restore in order to upgrade between versions:
I would run ts-dump --db-URI=postgresql://postgres:pwd1@localhost:5432/tsdb --dump-dir=~/dumps/dump1 --verbose --jobs=2
which will run the dump in verbose mode with 2 workers
Then : ts-restore --db-URI=postgresql://postgres:pwd1@localhost:5433/tsdb --dump-dir=~/dumps/dump1 --verbose --jobs=2
which will run in the same mode, and restore the dump I just created to the same database in the cluster running on port 5433.
We currently do not support many of the options that pg_dump
and pg_restore
do, please
submit issues or contributions for flags or options that you believe are important to
include.
Footnotes
-
In order to support parallel restores given the way the TimescaleDB catalog works, we first perform a pre-data restore, then restore the data for the catalog, then in parallel perform the data section for everything else and the post-data section for everything. ↩
-
This requires that you have the .so for the version you are restoring from and the version that you will update to. For instance, if your dump is from TimescaleDB 1.6.2 and the latest version is TimescaleDB 1.7.4, you need the .so from 1.6.2 available to restore to, and then it will update to 1.7.4 following the restore. Our default packages include several older versions to enable updates. ↩