salesforce/mirus

MirrorMaker migration documentation

OneCricketeer opened this issue ยท 7 comments

Regarding the Medium post

Mirus completely replaced Mirror Maker across all production data-centers at Salesforce in April 2018. Since then our data volumes have continued to grow.

For those who are running mirrormaker and have an active consumer group offset for their data and would prefer not to have duplicates after starting Mirus, is there a migration documentation available, or run-book that Salesforce applied for replacement?

No documentation available yet, but I will put something together based on our experience at Salesforce.

+1 :-)

@mtrienis Still on my todo list. The short version is that we shut down Mirror Maker, grabbed the Mirror Maker offsets using kafka-consumer-groups.sh , then used bin/mirus-offset-tool.sh with the --reset-offsets and --from-file flags to initialize the Mirus connector offsets. Then, when Mirus started it was able to pick up where Mirror Maker left off with no duplicates.

For the first few clusters we actually left Mirror Maker running in parallel for a few minutes, and accepted the duplicates, just to guarantee everything was running as expected. We still used mirus-offset-tool.sh to initialize our offsets to avoid a flood of duplicates.

Idea:
Could MirusOffsetTool be extended to capture the offset listing functionality of ConsumerGroupCommand so that two scripts wouldn't be needed?

@pdavidson100 @Cricket007 Can please share any sample file or format of the file that we supply to MirusOffsetTool with the flag --from-file for resetting offsets?
I'm getting error'ed out with not a valid Long value exception when I try to reset offsets.

@Hari4AMQ The --from-file format is identical to the output format generated by --describe, and supports both CSV and JSON (recommended for setting offsets). For example, if you're setting offsets for a 4 partition topic to 100, then the file format might look like this:

{"connectorId":"connector-id","topic":"topic-name","partition":0,"offset":100}
{"connectorId":"connector-id","topic":"topic-name","partition":1,"offset":100}
{"connectorId":"connector-id","topic":"topic-name","partition":2,"offset":100}
{"connectorId":"connector-id","topic":"topic-name","partition":3,"offset":100}

As @pdavidson100 mentioned, you should use the --describe option first and then edit the output file to the offsets needed of the partitions you want. This command is what I would use to get the offsets for topic t1:

bin/mirus-offset-tool.sh --properties-file config/<worker.properties> --describe  --format json | grep "\"topic\":\"t1\"" > t1-offsets.json

then edit the file t1-offsets.json with the desired offsets.