uuidjs/uuid

Question: Is mixing uuidv4 and uuidv5 collision safe?

mfbx9da4 opened this issue · 4 comments

I have a key value store where rows are indexed by uuid. For some of the keys I generate the key with uuidv4. For some of the other keys I generate the key from two other uuids from that same key value store.

e.g.

key value
2efd5459-fa72-4b28-a801-160e84fa049d alice
99a975d8-cadf-460a-b9ab-ce8352414d89 bob
2fc821c5-fa09-5a89-b355-e6d3b5b90fc8 alice_bob

The alice_bob row was generated via v5 of alice and bob's uuids.

//    alice                                   bob
u.v5('2efd5459-fa72-4b28-a801-160e84fa049d', '99a975d8-cadf-460a-b9ab-ce8352414d89')

Am I more likely to get a collision by mixing v4 and v5 in the same KV store than if I just used v4 or just used v5?

Am I likely to get a collision by mixing v4 and v5 in the same KV store than if I just used v4 or just used v5?

RFC4122 UUIDs have the version encoded in them, so [properly formed] v5 UUIDs will never collide with a v4 UUID, and vice-versa.

Outside of that, the odds of collision depend on the behavior of the respective UUID versions. The odds of v4 UUIDs is pretty well documented elsewhere. (tl;dr "vanishingly small"). v5 ids are deterministic hashes, so it mostly depends on the odds of you having the same input names, which isn't something we have control over. (There is a theoretical chance two different names will result in the same hash but, again, the odds of that are vanishingly small.)

image

Thanks very much for your response @broofa

At a second glance I found this question actually pretty interesting: in theory SHA1 hashes that are being used for v5 UUIDs also have a collision risk… I started wondering how that would compare to the collision risk of two v4 UUIDs 🤔

I have to admit that I stopped my thought process at this point and did not dig further.

@ctavan: Agreed, it is an interesting problem. I'm not really sure how to quantify it beyond pointing out that v4 and v5 UUIDs are generated in basically the same fashion. The only difference being the version # and the source of entropy. v4 uses an RNG, v5 uses SHA1 (to hash an input name & namespace). In both cases, the IDs have 122 bits of entropy.

As you know, I did some brainstorming a while back to figure out collision probabilities for v4, which I've just revisited to determine the # of UUIDs you'd have to generate for a 50% chance of collision. (Answer: 2.71+e18)

That assumes a "perfect" source of entropy, of course. For v4, the use of CSPRNGs is "close enough" to not warrant further investigation at this time. With v5, though, I know SHA1 is now deprecated for use in security contexts (choosen-prefix attacks are now feasible) but it's unclear to me what, if any, impact that has on v5 UUIDs. The RFC specifically enjoins the use of UUIDs for security purposes:

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities

Analyzing the quality of hash algorithms is a notoriously difficult problem. It does appear SHA1 has "excellent confusion and diffusion characteristics", however.

... and this is the point at which I stop and just throw my hands up in the air. Evaluating this further is going to take someone with better cryptographic chops than I.