dynamo-backup-to-s3
Stream DynamoDB backups to S3.
dynamo-backup-to-s3 is a utility to stream DynamoDB data to S3. Since the data is streamed directly from DynamoDB to S3 it is suitable for copying large tables directly. Tables are copied in parallel.
Can be run as a command line script or as an npm module.
Command line usage
Usage: dynamo-backup-to-s3 [options]
Options:
-h, --help output usage information
-V, --version output the version number
-b, --bucket <name> S3 bucket to store backups
-s, --stop-on-failure specify the reporter to use
-r, --read-percentage <decimal> specific the percentage of Dynamo read capacity to use while backing up. default .25 (25%)
-x, --excluded-tables <list> exclude these tables from backup
-i, --included-tables <list> only backup these tables
-p, --backup-path <name> backup path to store table dumps in. default is DynamoDB-backup-YYYY-MM-DD-HH-mm-ss
-e, --base64-encode-binary if passed, encode binary fields in base64 before exporting
-d, --save-datapipeline-format save in format compatible with the AWS datapipeline import. Default to false (save as exported by DynamoDb)
--aws-key AWS access key. Will use AWS_ACCESS_KEY_ID env var if --aws-key not set
--aws-secret AWS secret key. Will use AWS_SECRET_ACCESS_KEY env var if --aws-secret not set
--aws-region AWS region. Will use AWS_DEFAULT_REGION env var if --aws-region not set
npm module usage
Quick Example
var DynamoBackup = require('dynamo-backup-to-s3');
var backup = new DynamoBackup({
excludedTables: ['development-table1', 'development-table2'],
readPercentage: .5,
bucket: 'my-backups',
stopOnFailure: true,
base64Binary: true,
awsAccessKey: /* AWS access key */,
awsSecretKey: /* AWS secret key */,
awsRegion: /* AWS region */
});
backup.on('error', function(data) {
console.log('Error backing up ' + data.table);
console.log(data.err);
});
backup.on('start-backup', function(tableName, startTime) {
console.log('Starting to copy table ' + tableName);
});
backup.on('end-backup', function(tableName, backupDuration) {
console.log('Done copying table ' + tableName);
});
backup.backupAllTables(function() {
console.log('Finished backing up DynamoDB');
});
Documentation
Constructor
var options = {
excludedTables: /* tables to exclude from backup */,
includedTables: /* only back up these tables */
readPercentage: /* only consume this much capacity. expressed as a decimal (i.e. .5 means use 50% of table read capacity). default: .25 */,
bucket: /* bucket to upload the backup to */,
stopOnFailure: /* whether or not to continue backing up if a single table fails to back up */,
saveDataPipelineFormat /* save in format compatible with the AWS datapipeline import. Default to false (save as exported by DynamoDb) */,
awsAccessKey: /* AWS access key */,
awsSecretKey: /* AWS secret key */,
awsRegion: /* AWS region */,
backupPath: /* folder to save backups in. default: 'DynamoDB-backup-YYYY-MM-DD-HH-mm-ss',
base64Binary: /* whether or not to base64 encode binary data before saving to JSON */
};
var backup = new DynamoBackup(options);
Events
error
Raised when there is an error backing up a table
Example
backup.on('error', function(data) {
console.log('Error backing up ' + data.tableName);
console.log(data.error);
});
start-backup
Raised when the backup of a table is begining
Example
backup.on('start-backup', function(tableName) {
console.log('Starting to copy table ' + tableName);
});
end-backup
Raised when the backup of a table is finished
Example
backup.on('end-backup', function(tableName) {
console.log('Done copying table ' + tableName);
});
Functions
backupAllTables
Backups all tables in the given region while respecting the excludedTables
and includedTables
options
Arguments
callback(err)
- callback which is called when all backups are complete, or an error occurs andstopOnFailure
is true
backupTable
Backups all tables in the given region while respecting the excludedTables
and includedTables
options
Arguments
tableName
- name of the table to backupbackupPath
- (optional) the path to use for the backup. The iterator is passed acallback(err)
which must be called once it has completed. If no error has occurred, thecallback
should be run without arguments or with an explicitnull
argument.callback(err)
- A callback which is called when the table has finished backing up, or an error occurs
dynamo-restore-from-s3
Restore S3 backups back to Dynamo.
dynamo-restore-from-s3
is a utility that restores backups in S3 back to dynamo. It streams data down from S3 and throttles the download speed to match the rate of batch writes to Dynamo.
It is suitable for restoring large tables without needing to write to disk or use a large amount of memory. Use it on an AWS EC2 instance for best results and to minimise network latency, this should yield restore speeds of around 15min per GB.
Use --overwrite
if the table already exists. Otherwise it will attempt to generate table on the fly.
Can be run as a command line script or as an npm module.
Command line usage
Usage: dynamo-restore-from-s3 [options] -s "s3://mybucket/path/to/file.json" -t "new-dynamodb-table"
Options:
-h, --help output usage information
-V, --version output the version number
-s, --source [path] Full S3 path to a JSON backup file (Required)
-t, --table [name] Name of the Dynamo Table to restore to (Required)
-o, --overwrite Table already exists, skip auto-create. Default is false.
-c, --concurrency <requestcount> Number of concurrent requests & dynamo capacity units. Defaults to 200.
-pk, --partitionkey [columnname] Name of Primary Partition Key. If not provided will try determine from backup.
-sk, --sortkey [columnname] Name of Secondary Sort Key. Ignored unless --partitionkey is provided.
-rc, --readcapacity <units> Read Units for new table (when finished). Default is 5.
-wc, --writecapacity <units> Write Units for new table (when finished). Default is --concurrency.
-sf, --stop-on-failure Stop process when the same batch fails to restore multiple times. Defaults to false.
--aws-key <key> AWS access key. Will use AWS_ACCESS_KEY_ID env var if --aws-key not set
--aws-secret <secret> AWS secret key. Will use AWS_SECRET_ACCESS_KEY env var if --aws-secret not set
--aws-region <region> AWS region. Will use AWS_DEFAULT_REGION env var if --aws-region not set
Examples
# Restore over existing table (cmd.exe).
> node ./bin/dynamo-restore-from-s3 -t acme-customers -s s3://my-backups/acme-customers.json --overwrite
# Restore over existing table (shell).
$ ./bin/dynamo-restore-from-s3 -t acme-customers -s s3://my-backups/acme-customers.json --overwrite
# Restore over existing table, 1000 concurrent requests. Stop if any batch fails 1000 times.
$ ./bin/dynamo-restore-from-s3 -t acme-customers -c 1000 -s s3://my-backups/acme-customers.json --overwrite -sf
# Restore over existing table, 1000 concurrent requests. When finished, set read capacity to 50 and write capacity to 10 (both needed).
$ ./bin/dynamo-restore-from-s3 -t acme-customers -c 1000 -s s3://my-backups/acme-customers.json --overwrite --readcapacity 50 --writecapacity 10
# Auto-generate table (determine PK from backup).
$ ./bin/dynamo-restore-from-s3 -t acme-customers -s s3://my-backups/acme-customers.json
# Auto-generate table with partition and sort key.
$ ./bin/dynamo-restore-from-s3 -t acme-orders -s s3://my-backups/acme-orders.json -pk customerId -sk createDate
# Auto-generate table, defined PK. Concurrency 2000 (~ 2GB backup).
$ ./bin/dynamo-restore-from-s3 -t acme-orders -pk orderId -c 2000 -s s3://my-backups/acme-orders.json
# Auto-generate table. 2000 write units during restore. When finished set 50 write units and 100 write units (both needed).
$ ./bin/dynamo-restore-from-s3 -t acme-orders -c 2000 -s s3://my-backups/acme-orders.json --readcapacity 100 --writecapacity 50
# Auto-generate table. Concurrency 50 (10 MB backup or less).
$ ./bin/dynamo-restore-from-s3 -t acme-orders -c 50 -s s3://my-backups/acme-orders.json
# Auto-generate table. Concurrency 50. Stop process if any batch fails 50 times.
$ ./bin/dynamo-restore-from-s3 -t acme-orders -c 50 -sf -s s3://my-backups/acme-orders.json
npm module usage
Quick Example
var DynamoRestore = require('dynamo-backup-to-s3').Restore;
var restore = new DynamoRestore({
source: 's3://my-backups/DynamoDB-backup-2016-09-28-15-36-40/acme-customers-prod.json',
table: 'acme-customers-dev',
overwrite: true,
concurrency: 200, // for large restores use 1 unit per MB as a rule of thumb (ie 1000 for 1GB restore)
awsAccessKey: /* AWS access key */,
awsSecretKey: /* AWS secret key */,
awsRegion: /* AWS region */
});
restore.on('error', function(message) {
console.log(message);
process.exit(-1);
});
restore.on('warning', function(message) {
console.log(message);
});
restore.on('send-batch', function(batches, requests, streamMeta) {
console.log('Batch sent. %d in flight. %d Mb remaining to download...', requests, streamMeta.RemainingLength / (1024 * 1024));
});
restore.run(function() {
console.log('Finished restoring DynamoDB table');
});
Constructor
var options = {
source: /* path to json file in s3 bucket, should start with s3://bucketname/... */,
table: /* name of dynamo table, will be created on the fly unless overwritten */,
overwrite: /* true/false if table already exits (defaults to false) */
concurrency: /* number of concurrent requests (and dynamo write capacity units) */,
partitionkey: /* name of partition key column */,
sortkey: /* name of secondary (sort) key column */ ,
readcapacity: /* number of read capacity units (when restore finishes) */,
writecapacity: /* number of write capacity units (when restore finishes) */,
stopOnFailure: /* true/false should a single failed batch stop the whole restore job? */,
awsAccessKey: /* AWS access key */,
awsSecretKey: /* AWS secret key */,
awsRegion: /* AWS region */
};
var restore = new DynamoBackup.Restore(options);
Events
error
Raised when there is a fatal error restoring a table
Example
restore.on('error', function() {
console.log('Error!! + ' + message);
});
warning
Raised when there is a warning restoring a table. Normally this will be a failed batch.
Example
restore.on('warning', function() {
console.log('Warning!! + ' + message);
});
send-batch
Raised whenever a batch is sent to Dynamo. Useful for tracking progress.
Example
restore.on('send-batch', function(batches, requests, streamMeta) {
console.log('Batch Sent');
console.log('Num cached batches: ', batches);
console.log('Num requests in flight: ', requests);
console.log('Stream metadata:, JSON.stringify(streamMeta));
});
finish
Raised when the restore process is finished.
Example
restore.on('finish', function() {
console.log('All done!');
});
Functions
run
Restores table with options as defined in constructor.
Arguments
callback(err)
- callback to execute when restore job is complete. First argument exists only if there is an error.
Example
restore.run(function(error) {
if (error) {
return console.log(error);
}
console.log('All done!');
});