Fingerprints for non-JS ports

Question

Fingerprints for non-JS ports

mplanchard opened this issue 2 years ago · 3 comments

Working on adding cuid2 to the Rust cuid port, and trying to figure out how to do the fingerprint.

The JS version is a hash of:

random number from 2063-~4126
stringified object keys from the global object, which is either global (in node) or window (in browser)

In Rust, we don't have anything like the global object in node or the window object in the browser. So far, I've got:

same random number
process ID
thread ID

That gives different fingerprints for different processes & threads generating CUIDs on the same system, but doesn't guarantee anything across systems.

It looks like the Python port uses the system hostname, but that would reduce portability and prevents compiling the Rust to target-independent WASM.

One option that springs to mind is environment variables: the specific env var keys and values available to the process are likely to vary a fair bit across systems. On docker, this will include the HOSTNAME env var, which is generally set to the container ID. This is what I'm defaulting to for the moment, but would be curious to hear your thoughts.

We could also just rely on the random number, process ID, thread ID, and the hash entropy.

Answer 1 · 2023-01-17T22:48:56.000Z

Be careful with env vars.. how will those be allocated across different environments?

Is generally ok if these values CAN collide across hosts, as long as that is unlikely. In CUID, I often used multiple sources of host entropy to create fingerprints less likely to collide.

Answer 2 · 2023-01-18T00:19:55.000Z

Hmm, I guess whether env vars are appropriate would depend on what the purpose of the fingerprint portion of the CUID is and when it's intended to vary.

My assumption is that it should be as unique as possible for any given "instance" of a process/thread producing CUIDs. So if I have 10 machines running 10 docker containers, with each container spinning up 2 processes with 2 threads each, I'd expect we'd want 10 * 10 * 2 * 2 = 400 unique fingerprints going into the CUIDs, to help ensure that no two instances can ever generate duplicate IDs.

My worry with just including (random number + proc ID + thread ID) + hash_entropy is that the (random number + proc ID + thread ID) seems quite likely to overlap eventually given enough systems. The added entropy from the hash function plus the additional entropy in the CUID inputs may be enough to take care of it, but it seems like it'd be safer to try to include something more system-specific. That said, it turns out env vars aren't available in WASM builds anyway, so that rules them out, unless I use them on non-WASM builds and fall back to something else for WASM.

Answer 3 · 2023-01-22T01:19:53.000Z

Experimentally, it seems like the random data plus proc and thread IDs will probably generally be sufficient. Can update later if it isn't.