cowtowncoder/java-uuid-generator

What is a valid UUID

delanym opened this issue · 5 comments

I'm a little confused about what exactly constitutes a valid UUID.
As a baseline, it should represent a 128bit value, so no confusion about D1AF6FA2-BECF-4E54-AF6B-ABB8EE298A8A

If I divide the octets like so
D1AF6FA2-BECF-4E-54AF6B-ABB8EE298A8A
then https://github.com/openjdk/jdk/blob/3c6459e1de9e75898a1b32a95acf684050fbe1af/src/java.base/share/classes/java/util/UUID.java#L242
is happy with this, but will throw IllegalArgumentException for
D1AF6FA2-BECF-4E54-AF-6B-ABB8EE298A8A

Are AF and 6B not the clk_seq_hi_res and clk_seq_low mentioned in the spec https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.2 ?

Where does it say a UUID string must be 36 characters, and yet this is a requirement at https://github.com/cowtowncoder/java-uuid-generator/blob/3c01a59720d5d8fa1b31a25b913047bf19d8cbd7/src/main/java/com/fasterxml/uuid/impl/UUIDUtil.java#L93C114-L93C114

I consider specific 36 characters template, including placement of hyphens, to be canonical representation and nothing else being valid UUID. Looking at code I am not sure why there is the fallback to accept any other representations -- that seems wrong actually.

As to "where does it say" -- it has been a while since I read the spec. But I would turn it around and ask for suggestions of where alternate presentations are indicated as valid: there would need to be some specification that defines allowed variations.

I would be interested in knowing if you have found something wrt validity constraints as I do not recall seeing general use of different groupings, or lengths.

Hmmh. As per https://en.wikipedia.org/wiki/Universally_unique_identifier section "Textual representations":

Because a UUID is a 128 bit label, it can be represented in different formats. 

and offers some examples without much limitations. This is possibly why existing code supports other hyphen placements.

I think I would be open to allowing alternatives with different hyphen counts if (but only if) there is some documentation of
wide-spread used of such alternative(s).

There's a bug in UUID.fromString() which (probably) will never be fixed because it has been this way for many years.

For example, the string "a-b-c-d-e" is interpreted by JDK's UUID as "0000000a-000b-000c-000d-00000000000e", but it's not a "canonical string".

There is a formal ABNF definition for UUID strings in RFC-4122:

      The formal definition of the UUID string representation is
      provided by the following ABNF [7]:

      UUID                   = time-low "-" time-mid "-"
                               time-high-and-version "-"
                               clock-seq-and-reserved
                               clock-seq-low "-" node
      time-low               = 4hexOctet
      time-mid               = 2hexOctet
      time-high-and-version  = 2hexOctet
      clock-seq-and-reserved = hexOctet
      clock-seq-low          = hexOctet
      node                   = 6hexOctet
      hexOctet               = hexDigit hexDigit
      hexDigit =
            "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
            "a" / "b" / "c" / "d" / "e" / "f" /
            "A" / "B" / "C" / "D" / "E" / "F"

ABNF translation:

  • 1st group has 8 hex digits;
  • 2nd group has 4 hex digits;
  • 3rd group has 4 hex digits;
  • 4th group has 4 hex digits;
  • 5th group has 12 hex digits;
  • It gives 36 chars (32 hex digits + 4 hyphens).

EDIT:
Just a word of advice for anyone reading this in the future: don't rely on UUID.fromString() for validation. Use UUIDUtil or Regex.

How I miss the ABNF in the spec. Thanks for giving this an eye.

Thank you @fabiolimace for excellent summary and recommendations!