crate/cratedb-toolkit

DynamoDB: Support batching on full-load operations

amotl opened this issue · 3 comments

Problem

The DynamoDB Table Loader does not do bulk loading yet. It needs to be implemented to transfer larger amounts of data more efficiently.

Details

if key is None:
    response = table.scan(Limit=bulk_size)
else:
    response = table.scan(ExclusiveStartKey=key, Limit=bulk_size)

Other than the snippet above, which may effectively just emulate creating batches of data manually, there also appears to be a native operation variant on the DynamoDB API, called BatchGetItem. It might be the right choice to use from the beginning.

I did not look into the details yet, so please advise and correct me where I am wrong. Thank you very much. 🍀

Contrary to my previous assessment, the BatchGetItem and BatchExecuteStatement operations are not about retrieving multiple items in bulk, but rather about submitting multiple queries within a single request.

Using the Scan operation, together with Pagination, like displayed in the code snippet in the OP, is absolutely the right choice.

An upcoming patch will implement the canonical Scan+Pagination procedure.