uuidjs/uuid

Collision with v4 uuid

prm-dan opened this issue · 13 comments

Describe the bug

I'm guessing I'm setting up something incorrectly. I'm using browser-side generated uuids and logging them to the server. I have a collision and I have only have logged about 100k uuids. I shouldn't have a collision with 100k uuids so I'm guessing I have a bug in my code. I'm assuming that I need to polyfill or something.

How to reproduce

I don't have a small repo. This is integrated into a larger app. I noticed a collision pretty quickly.

import { v4 as uuidv4 } from 'uuid';
...
log(uuidv4())

Expected behavior

I would not expect a collision with 100k uuids.

Runtime

  • OS: Unknown.
  • Runtime: I don't know for sure. Almost all of the users use either Chrome or Safari (50-50).
  • Runtime Version: 14.x

Additional information

[Any other information or comments that you think will help]

Without knowing more about how the duplicate UUIDs are being generated it's difficult to provide meaningful guidance.

For starters, are the UUIDs being generated on your server or clients? If the latter, is it possible requests are being sent more than once? E.g. Do your clients have retry logic for network requests?

FWIW, it's almost certainly not an issue with code in this module, per se. On the rare occasion we see duplicates it's either a bona fide bug in OP's code or an external issue.

These UUIDs are generated on the browser-side. These were generated on two different browsers (different userIds and IPs).

There aren't any known issues with any specific browsers?

There aren't any known issues with any specific browsers?

Not at this time.

What version of the uuid library are you using? In versions before 7.0 v4 uuids were generated using a Math.random()-based random-number-generator. I recall that this caused duplicate UUIDs for requests caused by the Google crawler (Googlebot User-Agent) since apparently the google crawler was using a special Math.random() implementation which was causing stable results.

That said, versions >= 7.0 of this library no longer fall back to Math.random() and instead always use the WebCrypto API. I have no idea if e.g. Googlebot (or other Bots) exhibit a similar behavior for the WebCrypto APIs.

Is your website publicly available? Could the duplicate UUIDs maybe be caused by such bots that use custom WebCrypto implementations?

Hmm, I'm using v8.3.1. It could be a crawler. I'll have to add more logging to see.

That would be great! Would be very curious to hear about your results.

Can you post some of the duplicate IDs here, just for good measure? It'd be good to confirm they're actually duplicates, and actually v4 ids. Sorry, but we've had this sort of issue before, where duplicates that weren't actually duplicates were reported. (This was with v1 ids, which are easier to make that mistake with, though.)

Yes, the user agent is listed as GoogleBots and ad bots.

Here are a couple UUIDs:
5f1b428b-53a5-4116-b2a1-2d269dd8e592
a5511632-a12d-469d-98e5-92152fc0ae43

Yea, I think this a bug with Snowplow analytics using an old version of uuid.
https://discourse.snowplowanalytics.com/t/google-bot-sending-the-same-uuids-back-across-multiple-pages/4827/3

Ah, someone at Snowplow said they haven't updated partially because of uuid dropping support for IE 9 and 10.

This is a tricky one to solve (I’ve been pondering this given the impending major release of the JS tracker), upgrading the uuid v8.x means we’d have to drop IE 9 and 10 support which might seem tempting at first but the Snowplow JavaScript Tracker needs to work in as many places as possible, so users can understand where all their traffic is coming from - this means the tracker needs to support the widest possible range of browsers. It’s also hard (although not impossible as I could wrap it) to create a ie9/10 compat version of the tracker because the uuid library isn’t api compatible between v3 and v8. I’ve been trying not to create an ie9/10 specific version as I don’t really want to live in a world with different paths (maintainance and debugging nightmare waiting to happen).

Ah, someone at Snowplow said they haven't updated partially because of uuid dropping support for IE 9 and 10.

FWIW, I just tested crypto.getRandomValues() behavior on googlebot and it is also deterministic(!) (<-- @ctavan @bcoe, take note!), so upgrading to uuid@latest probably won't fix this issue for Snowplow.

I suppose one workaround would be to detect when we're running in a googelbot client and "be smart", but the options there are limited. Google clearly wants googlebots to be deterministic. Working around that won't be easy.

We could throw an error when run in googlebot clients, similar to when we don't find getRandomValues(). But I'm not convinced that's the right thing to do. A lot of our users will find it surprising and obnoxious.

Both of the above solutions start us down the slippery slope of detecting clients with known RNG issues. I'm not thrilled about that idea.

Googlebot and it's ilk are a special case, where ID collisions are arguably by design. I'm not sure it's our responsibility to deal with that. I'd be okay with just adding a cautionary note to the README about this, and telling project owners it's up to them to figure it out.

Yea, updating the README sounds great.

I think this is working as intended and updating the docs is the best thing we can do!

(and thanks for verifying crypto behavior, @broofa!)