mapbox/dynamodb-replicator

incremental backup and incremental backfill generate different file names

Opened this issue · 2 comments

Hi there!

First off, great library. It's super useful and a much better/simpler option (for me) than the whole EMR/Datapipeline situation.

I have this simple lambda function that is subscribed to the tables I want to update:
(the bucket, region, and prefix are set as env variables in the lambda function)

var replicator = require('dynamodb-replicator')
module.exports.streaming = (event, context, callback) => {
  return replicator.backup(event, callback)
}

Then I ran the backfill by importing dynamodb-replicator/s3-backfill and passing it a config object.

However, I noticed that when records get updated via the stream/lambda function, they are written to a different file from the one created by the backfill.

I see that the formula for generating filenames is slightly different.

\\backfilll
            var id = crypto.createHash('md5')
                .update(Dyno.serialize(key))
                .digest('hex');

\\backup
            var id = crypto.createHash('md5')
                .update(JSON.stringify(change.dynamodb.Keys))
                .digest('hex');

https://github.com/mapbox/dynamodb-replicator/blob/master/s3-backfill.js#L46-L48
https://github.com/mapbox/dynamodb-replicator/blob/master/index.js#L130-L132

Does this make any practical difference? Should the restore function work regardless?

I've realized that Dyno.serialize in backfill just converts from js objects to DynamoDB JSON, which is what get from the stream in backup. Then I'm not sure why they generate different keys. maybe the order of the stringified keys?

confirmed that sorting key obj before generating the id hash resolves this issue.