incremental backup and incremental backfill generate different file names
Opened this issue · 2 comments
Hi there!
First off, great library. It's super useful and a much better/simpler option (for me) than the whole EMR/Datapipeline situation.
I have this simple lambda function that is subscribed to the tables I want to update:
(the bucket, region, and prefix are set as env variables in the lambda function)
var replicator = require('dynamodb-replicator')
module.exports.streaming = (event, context, callback) => {
return replicator.backup(event, callback)
}
Then I ran the backfill by importing dynamodb-replicator/s3-backfill
and passing it a config object.
However, I noticed that when records get updated via the stream/lambda function, they are written to a different file from the one created by the backfill.
I see that the formula for generating filenames is slightly different.
\\backfilll
var id = crypto.createHash('md5')
.update(Dyno.serialize(key))
.digest('hex');
\\backup
var id = crypto.createHash('md5')
.update(JSON.stringify(change.dynamodb.Keys))
.digest('hex');
https://github.com/mapbox/dynamodb-replicator/blob/master/s3-backfill.js#L46-L48
https://github.com/mapbox/dynamodb-replicator/blob/master/index.js#L130-L132
Does this make any practical difference? Should the restore function work regardless?
I've realized that Dyno.serialize
in backfill
just converts from js objects to DynamoDB JSON, which is what get from the stream in backup
. Then I'm not sure why they generate different keys. maybe the order of the stringified keys?
confirmed that sorting key obj before generating the id hash resolves this issue.