awslabs/amazon-documentdb-tools

Migrator Tool with Change Streams Missing Some Documents

Closed this issue ยท 2 comments

๐Ÿ‘‹ Hey,

We're attempting to use the Migrator Tool with Change Streams to move data from DocumentDB to Sharded MongoDB (i.e. we're issuing the writes to a MongoS instance)

It appears to work for the most part, but misses a few 10's of documents so not every document is transferred.

For example our exchanges.league_event collection in DocumentDB has 681,653 documents.

The migrator tool moves 681,601 documents to MongoDB and then the remaining 52 documents are never transferred

These are the logs from the tool

python3 cdc-multiprocess.py --source-uri 'mongodb://aarontest:<PASSWORD>i@t-aarontest-docdb-240528-1229-9f.cluster-cxismqp4lkcr.us-east-1.docdb.amazonaws.com:27018/?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false' --target-uri 172.27.79.116:27017 --source-namespace exchanges.league_event --use-change-stream --start-position NOW

[2024-05-30T14:46:33.797Z] thread  -1 | processing changeStream using 1 threads

[2024-05-30T14:46:33.797Z] thread  -1 | starting with timestamp = 2024-05-30 14:46:33+00:00
[2024-05-30T14:47:33.933Z] elapsed 00:01:00.05 | total o/s         0.00 | interval o/s         0.00 | tot                0 |            0 secs behind | resume token = N/A

[2024-05-30T14:48:33.993Z] elapsed 00:02:00.11 | total o/s        20.82 | interval o/s        41.64 | tot            2,501 |            3 secs behind | resume token = 01665891be00005b9c0100005b9c00109397

[2024-05-30T14:49:34.136Z] elapsed 00:03:00.17 | total o/s       304.17 | interval o/s       870.80 | tot           54,801 |           31 secs behind | resume token = 01665891df0000298e010000298e00109397

[2024-05-30T14:50:34.405Z] elapsed 00:04:00.31 | total o/s       913.41 | interval o/s     2,738.43 | tot          219,501 |           58 secs behind | resume token = 0166589200000006c301000006c300109397

[2024-05-30T14:51:34.617Z] elapsed 00:05:00.57 | total o/s     1,282.89 | interval o/s     2,756.27 | tot          385,601 |           86 secs behind | resume token = 016658922000001feb0100001feb00109397

[2024-05-30T14:52:34.874Z] elapsed 00:06:00.79 | total o/s     1,684.09 | interval o/s     3,686.65 | tot          607,601 |          102 secs behind | resume token = 016658924c00000643010000064300109397

[2024-05-30T14:53:34.995Z] elapsed 00:07:01.05 | total o/s     1,618.82 | interval o/s     1,228.01 | tot          681,601 |          147 secs behind | resume token = 016658925b00001645010000164500109397

[2024-05-30T14:54:35.054Z] elapsed 00:08:01.17 | total o/s     1,416.55 | interval o/s         0.00 | tot          681,601 |          208 secs behind | resume token = 016658925b00001645010000164500109397

[2024-05-30T14:55:35.102Z] elapsed 00:09:01.22 | total o/s     1,259.39 | interval o/s         0.00 | tot          681,601 |          268 secs behind | resume token = 016658925b00001645010000164500109397

The logs suggest it gets to 681,601 documents and then the secs behind just keeps growing.

We start the tool to monitor the Change Streams and transfer documents, minutes before we start inserting documents into the DocDB, so it shouldn't be a case of those 52 documents have expired from the Change Stream.

We see the exact same thing with another collection where there are 340,375 documents in DocDB and the tool only moves 340,301 documents to MongoDB (missing 74)

Hey,

I did some more debugging today and figured out whats going on here....

The collections I was transferring from DocDB -> Sharded MongoDB were not getting any updates

The script buffers updates from the Change Stream and then Bulk Writes them if either:

However this check is only executed when an event arrives on the Change Stream

So because these collections are not being actively written to the final batch of documents are never written.

If I write a fake document to the collection, the script realises that more than 5 seconds have passed since the last write and flushes the buffer of documents

I guess not really a bug, more a misunderstanding of how its supposed to work, so closing this