Fingerprints for non-JS ports
mplanchard opened this issue · 3 comments
Working on adding cuid2
to the Rust cuid
port, and trying to figure out how to do the fingerprint.
The JS version is a hash of:
- random number from 2063-~4126
- stringified object keys from the global object, which is either
global
(in node) orwindow
(in browser)
In Rust, we don't have anything like the global
object in node or the window
object in the browser. So far, I've got:
- same random number
- process ID
- thread ID
That gives different fingerprints for different processes & threads generating CUIDs on the same system, but doesn't guarantee anything across systems.
It looks like the Python port uses the system hostname, but that would reduce portability and prevents compiling the Rust to target-independent WASM.
One option that springs to mind is environment variables: the specific env var keys and values available to the process are likely to vary a fair bit across systems. On docker, this will include the HOSTNAME
env var, which is generally set to the container ID. This is what I'm defaulting to for the moment, but would be curious to hear your thoughts.
We could also just rely on the random number, process ID, thread ID, and the hash entropy.
Be careful with env vars.. how will those be allocated across different environments?
Is generally ok if these values CAN collide across hosts, as long as that is unlikely. In CUID, I often used multiple sources of host entropy to create fingerprints less likely to collide.
Hmm, I guess whether env vars are appropriate would depend on what the purpose of the fingerprint
portion of the CUID is and when it's intended to vary.
My assumption is that it should be as unique as possible for any given "instance" of a process/thread producing CUIDs. So if I have 10 machines running 10 docker containers, with each container spinning up 2 processes with 2 threads each, I'd expect we'd want 10 * 10 * 2 * 2 = 400 unique fingerprints going into the CUIDs, to help ensure that no two instances can ever generate duplicate IDs.
My worry with just including (random number + proc ID + thread ID) + hash_entropy
is that the (random number + proc ID + thread ID)
seems quite likely to overlap eventually given enough systems. The added entropy from the hash function plus the additional entropy in the CUID inputs may be enough to take care of it, but it seems like it'd be safer to try to include something more system-specific. That said, it turns out env vars aren't available in WASM builds anyway, so that rules them out, unless I use them on non-WASM builds and fall back to something else for WASM.
Experimentally, it seems like the random data plus proc and thread IDs will probably generally be sufficient. Can update later if it isn't.