paralleldrive/cuid2

Some questions for a port to Go

nlepage opened this issue · 11 comments

Hello,

I discovered cuid2 (and cuid) today, and I gave a try at porting it to Go: https://github.com/nlepage/go-cuid2

It seems to be working fine, here are some ids generated with default length:

mhsx3pf2p8ocd2nlaioilwah
jt06wcr8l51t7wj9bxsdb0al
fvqdusrnkcohfi68gor7pooa
sdgqj4r3j0om523ykm0nch85
qw60k14c0kq14gnqjis8nh9f

I have some questions:

  • In createEntropy(), I'm calling a random prime number generator from the Go std library instead of picking in a predefined list of prime numbers
    • is ot OK?
    • isn't it too expensive? an alternative could be to generate a random prime numbers list once, then pick in this list when createEntropy() is called
  • In createFingerprint(), I don't have an equivalent of Object.keys(globalObj).toString()) to give to the hash function, is it a problem?

Thanks for your help!

You don't need to do the random prime thing in createEntropy. I will eventually pull it out of the JS version.

If you don't have host-unique global entropy to use as a fingerprint, you should find another source of host fingerprint entropy (e.g. hostname, pid). You should also parametrize it in case whatever you pick doesn't work for some users.

What's the status on your port?

Hi @ericelliott
Thanks for your responses.

I'll try to integrate the simplified createEntropy into my port this week.

Regarding the host-unique global entropy, I'll try to build a fingerprint using some host information as you suggested.

Looking forward to it! 😎

Is this done? Can we add the Go port to the docs?

I made the changes, you can see the diff here nlepage/go-cuid2@3097e1d...c4d3129

For the fingerprint creation I used the OS environment, and added the hostname and pid to it.

I have some questions left:

  • In the JS version, why don't you give the random source to createFingerprint()? It calls createEntropy() which as a result uses Math.random()
  • Why are you using Object.keys() on globalObj? Wouldn't it be better to use Object.entries()?

In the JS version, why don't you give the random source to createFingerprint()? It calls createEntropy() which as a result uses Math.random()

This is a bug. 🐛

Why are you using Object.keys() on globalObj? Wouldn't it be better to use Object.entries()?

A few reasons:

  1. Keys usually supply sufficient cross-host entropy in JavaScript, because most hosts have different global keys, especially when paired with the random entropy.
  2. Values can grow extremely large and be many different types, which may or may not map well to strings, which could waste a lot of time and potentially cause errors.
  3. Security: Values may contain secrets that we don't want to potentially leak. Our entropy smoothie should be enough to prevent that from happening, but we should follow the principle of least knowledge here. Don't ask for information you don't actually need to do the job.

Okay, so using the system environment variables' keys and values as I did in the Go port is probably not a good idea (same potentially large values and security risk as Object.entries()).
I'm going to change that and use only the environment variables' keys.

One minor question : in bufToBigInt(), after shifting the number 8 bits left, I'm using a binary or (| operator) instead of an addition, I think it's OK, and might be a little faster, can you confirm?

Appart from that I think the port is OK and can be added to the list.
I still need to write some docs and more tests though.

Also is there a maximum length for generated ids?
What I see in practice is around 97/98 characters.

Okay, so using the system environment variables' keys and values as I did in the Go port is probably not a good idea (same potentially large values and security risk as Object.entries()).
I'm going to change that and use only the environment variables' keys.

Good call.

One minor question : in bufToBigInt(), after shifting the number 8 bits left, I'm using a binary or (| operator) instead of an addition, I think it's OK, and might be a little faster, can you confirm?

This sounds like it should be equivalent, but I wouldn't value a tiny micro-optimization over the potential to introduce a subtle bug in the crypto. Especially since compilers tend to automatically make optimizations like that anyway. Did you profile it?

Also is there a maximum length for generated ids? What I see in practice is around 97/98 characters.

I went with a conservative 32 char max length pretty arbitrarily, but mostly because ids should be long enough to be practically impossible to guess or extract entropy from - but not more. Excessive size causes other problems, such as storage, transport, and usability concerns. Users who want more probably need things like secret keys, or other things this standard was not designed for. They should probably be using a purpose-built, cryptographically secure algorithm. Better to have a sane range that solves the common case for 99.9% of users, IMO.