Script for formatting AWS DynamoDB
dump to proper JSON
file.
You should have NodeJS and Sed to be installed.
Use run.sh
script to perform trasnformation of raw DynamoDB
dump file to JSON
file:
$ ./run.sh <dump_file_name> <output_file_name>
You also can run script without parameters, then it will use default values of the parameters, like:
$ ./run.sh raw-data output-data.json
Where file raw-data
you can find in this repositories with some examples of DynamoDB
dump data.
To get DynamoDB
dump from AWS
you need to setup data pipeline.
How to do it, you can read about it in AWS
documentation:
Export Data from DynamoDB
First with a sed
command line tool, dump data file will be cleaned up from the unicode characters and formatted to the JSON
objects. Example of the command:
sed -e 's/$/}/' -e $'s/\x02/,"/g' -e $'s/\x03/":/g' -e 's/^/{"/' <dump_file_name> > <output_file_name>
Then with small javascript
script data will be formatted to the proper JSON
objects representation with correct JSON
values (instead of {"key": "string value"}
dump stores {"key": {"s": "string value"}}
).
Later JSON
objects will be wrapped in array and put to the output file.
Beaware that currently javascript
script knows how to work with numbers
and strings
values at dump.
To make it handle another types, please tweek the script.
Full list of DynamoDB
values types you can find in AWS
documentation:
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_AttributeValue.html
For the inspiration I took the repository https://github.com/JasonGhent/AWS-DynamoDB-to-MongoDB. Thank you Jason Ghent for your help!
Copyright (C) 2016 Pavlo Voznenko.
Distributed under the MIT License.