Using IdGen for migrating legacy data
nothings-more opened this issue · 4 comments
Hello, I have a couple of questions about using IdGen for project with legacy data:
-
Is it correct that to avoid collisions between Id-s generated by two instances of an "IdGenerator" it is enough to use different "generator-id-part" values for those instances?
-
In case of having legacy data with DB-generated Id-s (auto-increment Identity), for migrating to client-generated Id-s using "IdGen" we should at least verify that maximum Id in existing data is less than minimum Id which will be generated for new records by "IdGen" (with particular epoch and "generator-id-part"). Maybe you have any other caveats/advices for such a use-case?
Thanks in advance
- Yes. You'll want the generator-id part to be "globally" (or even "universally") unique; if you have multiple instances then (be it on different hosts, or on the same host in different processes or even in the same process in different threads), then each of the instances will need an unique generator-id.
- You're correct, as long as the max ID is less than the id's generated by IdGen at that time you should be ok. You can play with the epoch date if you want to; the closer you get it to "today" the lower your ID's will be. However, once you pick an epoch you'll need to stick with it.
You can consider an epoch of 2020-01-01 or 2021-06-01 or 2021-09-01 for example, and each of these epochs the ID's will decrease (and then increase each generated ID ofcourse) and probably get closer to your current maximum ID. But the "you'll need to stick with it" goes for the entire IdGeneratorOptions
and IdStructure
. I'm not saying it's impossible but it will be a real drag to change this (for example to have more generator-id bits) later on. So make sure you've thought things through for your specific situation and requirements.
That's about alle the advice I have I guess... Feel free to ask follow-up questions if you have any. It's a lot less complicated than it seems I guess.
Thanks for such thorough answers/explanations! And a couple of clarification questions from my side:
- is that correct that using different "generator-id" parts is the only way to make several instances of IdGenerator to generate unique sequences of IDs?
- as far as I understand, IdGenerator have kind of "resolution" , i.e. it can generate only limited number of IDs per time period - it is 4096 per millisecond with default settings, and can be increased by changing settings, but it will still be limited, right?
Thanks again.
- Yes. The README explains exactly how an ID is structured and generated.
- Yes, there's always a limit. You can sacrifice some generator-id bits trading less (possible) generators for more ID's per time period: for each bit added to the sequence part you'll double the max number for the sequence for the period. You can even sacrifice some timestamp bits (but that'll decrease the resolution of your timestamp). But I'd say 4096 per millisecond is a whole lot. That's over 4 million per second per generator. With 10 generators in the system you can do 40 million ID's per second. With 1024 generators (the maximum in the default settings) you can do a theoretical of 1024 * 4096 * 1000 = 4,194,304,000 ID's per second.
You can vary the ID structure's parts (timestamp, generator-id, sequence) number of bits; but whatever you pick you'll have to live with (or you'll have to renumber your ID's or do other trickery) once you start generating and using ID's in that chosen configuration. As long as all 3 parts add up to 63 bits (and generator-id and sequence don't exceed 31 bits) you should be good to go.
I see, thanks a lot