A utility that allows CSV import / export to DynamoDB on the command line
Give a โญ๏ธ if you like this tool!
I made this command because I didn't have any tools to satisfy my modest desire to make it easy to import CSV files into DynamoDB. Written in a simple Python script, it's easy to parse and modify.
It works for me.
$ pip install dynamodb-csv
$ dynamodb-csv -h
usage: dynamodb-csv [-h] [-v] [-i] [-e] [--truncate] [--move] -t [TABLE ...] [-idx INDEX] [-f FILE] [-o OUTPUT] [--ignore]
[--profile PROFILE]
Import CSV file into DynamoDB table utilities
optional arguments:
-h, --help show this help message and exit
-v, --version show version
-i, --imp mode import
-e, --exp mode export
--truncate mode truncate
--move mode move
-t [TABLE ...], --table [TABLE ...]
DynamoDB table name
-idx INDEX, --index INDEX
DynamoDB index name
-f FILE, --file FILE UTF-8 CSV file path required import mode
-o OUTPUT, --output OUTPUT
output file path required export mode
--ignore ignore import error
--profile PROFILE using AWS profile
Setup and install
$ python -m venv venv
$ . venv/bin/activate
$ python setup.py install
$ dynamodb-csv -h
Or
$ python -m venv venv
$ . venv/bin/activate
$ pip install -r requirements-dev.txt
$ export PYTHONPATH=`pwd`
$ python app/main.py -h
For Windows
> python -m venv venv
> venv\Scripts\activate
> pip install -r requirements-dev.txt
> set PYTHONPATH=%cd%
> python app/main.py -h
Or you can use devcontainer.
$ docker run --rm -v ${PWD}/:/local danishi/dynamodb-csv:tagname -i -t my_table -f sample.csv
For Windows
> docker run --rm -v %cd%/:/local danishi/dynamodb-csv:tagname -i -t my_table -f sample.csv
[AWS]
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
REGION=your_dynamodb_table_region
# Option
#ENDPOINT_URL=http://dynamodb-local:8000
Not required if AWS profile is specified as a parameter.
Note
Prepare a UTF-8 CSV file of the format you want to import into your DynamoDB table and a file that defines that format.
Please refer to this writing method.
StringPK,NumberSK,DecimalValue,BooleanValue,NullValue,JsonValue,StringListValues,DecimalListValues
foo,1,1.23,TRUE,,"[{""string"" : ""value""},{""number"" : 100}]",foo bar baz,10 10.1 20
foo,2,0.001,,,"[{""boolean"" : true}]",ใชใณใด ใใใ ในใคใซ,10 10.1 20
foo,3,1,,,"[{""boolean"" : false}]",,
# sample.csv data format specification
# String : S
# Integer : I
# Decimal : D
# Boolean : B (blank false)
# Json : J
# StringList : SL
# StringSet : SS
# DecimalList : DL
# DecimalSet : DS
[CSV_SPEC]
StringPK=S
NumberSK=I
DecimalValue=D
BooleanValue=B
NullValue=S
JsonValue=J
StringListValues=SL
StringSetValues=SS
DecimalListValues=DL
DecimalSetValues=DS
# [DELIMITER_OPTION]
# DelimiterCharacter=|
The CSV_SPEC type is mapped to the DynamoDB attribute type in this way.
CSV_SPEC | DynamoDB attribute data type | example value |
---|---|---|
String : S | String | foo |
Integer : I | Number | 1 |
Decimal : D | Number | 1.23 |
Boolean : B | Boolean | TRUE |
Json : J | Map | [{""string"" : ""value""},{""number"" : 100}] |
StringList : SL | List | foo bar baz |
StringSet : SS | String Set | foo bar baz |
DecimalList : DL | List | 10 10.1 20 |
DecimalSet : DS | Number Set | 10 10.1 20 |
Sorry, Binary type and Binary Set type is not supported. Null type, look here.
The default delimiter for list and set types is a space.
If you want to set it, please comment out DELIMITER_OPTION
and DelimiterCharacter
.
Note
You need to have created a DynamoDB table that meets your specifications.
$ aws dynamodb create-table --cli-input-json file://my_table.json --region ap-northeast-1
$ aws dynamodb describe-table --table-name my_table
{
"Table": {
"AttributeDefinitions": [
{
"AttributeName": "NumberSK",
"AttributeType": "N"
},
{
"AttributeName": "StringPK",
"AttributeType": "S"
}
],
"TableName": "my_table",
"KeySchema": [
{
"AttributeName": "StringPK",
"KeyType": "HASH"
},
{
"AttributeName": "NumberSK",
"KeyType": "RANGE"
}
],
"TableStatus": "ACTIVE",
"CreationDateTime": "2022-06-26T21:19:21.767000+09:00",
"ProvisionedThroughput": {
"NumberOfDecreasesToday": 0,
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
},
"TableSizeBytes": 0,
"ItemCount": 0,
"TableArn": "arn:aws:dynamodb:ap-northeast-1:XXXXXXXXXXX:table/my_table",
"TableId": "XXXXXXXX-925b-4cb1-8e3a-604158118c3f",
"GlobalSecondaryIndexes": [
{
"IndexName": "NumberSK-index",
"KeySchema": [
{
"AttributeName": "NumberSK",
"KeyType": "HASH"
}
],
"Projection": {
"ProjectionType": "INCLUDE",
"NonKeyAttributes": [
"DecimalValue",
"JsonValue"
]
},
"IndexStatus": "ACTIVE",
"ProvisionedThroughput": {
"NumberOfDecreasesToday": 0,
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
},
"IndexSizeBytes": 0,
"ItemCount": 0,
"IndexArn": "arn:aws:dynamodb:ap-northeast-1:XXXXXXXXXXX:table/my_table/index/NumberSK-index"
}
]
}
}
This command requires a CSV spec file in the same directory.
$ dynamodb-csv -i -t my_table -f sample.csv
please wait my_table importing sample.csv
300it [00:00, 19983.03it/s]
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 300/300 [00:07<00:00, 40.97it/s]
my_table csv imported 300 items
It is processed at high speed by batch write.
If there is an error such as a key schema mismatch, you can give the option to ignore the CSV record.
$ dynamodb-csv -i -t my_table -f sample.csv --ignore
please wait my_table importing sample.csv
300it [00:00, 19983.03it/s]
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 300/300 [00:07<00:00, 40.97it/s]
my_table csv imported 299 items and 1 error items
No batch write is done when this option is used.
By default, if CSV has an empty value, it will be set to empty.
There are options to convert this to Null or not to set the attribute itself.
[IMPORT_OPTION]
ConvertBlankToNullAttrs=NullValue,JsonValue
ConvertBlankToDropAttrs=DecimalValue
You will also need to expand the same data to multiple tables.
Therefore, data can be exported.
As with import, you need a CSV spec file.
$ dynamodb-csv -e -t my_table -o sample_exp.csv
please wait my_table exporting sample_exp.csv
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 300/300 [00:00<00:00, 16666.77it/s]
my_table csv exported 300 items
$ dynamodb-csv -e -t my_table -idx NumberSK-index -o sample_gsi_exp.csv
$ dynamodb-csv -e -t my_table -idx NumberSK-index -o sample_query_exp.csv
# sample_query_exp.csv data format specification
# Integer : I
# String : S
# Decimal : D
# Json : J
[QUERY_OPTION]
PKAttribute=NumberSK
PKAttributeValue=1
PKAttributeType=I
[CSV_SPEC]
NumberSK=I
StringPK=S
DecimalValue=D
JsonValue=J
key | description | example |
---|---|---|
PKAttribute |
Partition key attribute name | |
PKAttributeValue |
Partition key attribute query value | |
PKAttributeType |
Partition key attribute data type | |
SKAttribute |
Sort key attribute name | |
SKAttributeValues |
Sort key attribute query value or values | ex. foo or foo,bar |
SKAttributeType |
Sort key attribute data type | |
SKAttributeExpression |
Sort key attribute query expression | ex. begins_with between eq gt gte lt lte |
$ dynamodb-csv -e -t my_table -o sample_query_exp2.csv
[QUERY_OPTION]
PKAttribute=StringPK
PKAttributeValue=bar
PKAttributeType=S
SKAttribute=NumberSK
SKAttributeValues=50,100
SKAttributeType=I
SKAttributeExpression=between
Also, since you may want to erase unnecessary data during the import experiment, we have prepared a command to discard it.
$ dynamodb-csv --truncate -t my_table
my_table scan 300 items
please wait my_table truncating
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 300/300 [00:07<00:00, 40.95it/s]
my_table truncated
Caution
This operation is irreversible. Take care.
Move all items from table to table.
A table with the same schema must be prepared in advance.
Table items is not deleted and behaves like a copy.
$ dynamodb-csv --move -t my_table_from my_table_to
my_table_from scan 300 items
please wait my_table_to moving
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 300/300 [00:15<00:00, 20.00it/s]
my_table_to moved 300 items
See LICENSE