What is a valid UUID
delanym opened this issue · 5 comments
I'm a little confused about what exactly constitutes a valid UUID.
As a baseline, it should represent a 128bit value, so no confusion about D1AF6FA2-BECF-4E54-AF6B-ABB8EE298A8A
If I divide the octets like so
D1AF6FA2-BECF-4E-54AF6B-ABB8EE298A8A
then https://github.com/openjdk/jdk/blob/3c6459e1de9e75898a1b32a95acf684050fbe1af/src/java.base/share/classes/java/util/UUID.java#L242
is happy with this, but will throw IllegalArgumentException
for
D1AF6FA2-BECF-4E54-AF-6B-ABB8EE298A8A
Are AF
and 6B
not the clk_seq_hi_res and clk_seq_low mentioned in the spec https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.2 ?
Where does it say a UUID string must be 36 characters, and yet this is a requirement at https://github.com/cowtowncoder/java-uuid-generator/blob/3c01a59720d5d8fa1b31a25b913047bf19d8cbd7/src/main/java/com/fasterxml/uuid/impl/UUIDUtil.java#L93C114-L93C114
I consider specific 36 characters template, including placement of hyphens, to be canonical representation and nothing else being valid UUID. Looking at code I am not sure why there is the fallback to accept any other representations -- that seems wrong actually.
As to "where does it say" -- it has been a while since I read the spec. But I would turn it around and ask for suggestions of where alternate presentations are indicated as valid: there would need to be some specification that defines allowed variations.
I would be interested in knowing if you have found something wrt validity constraints as I do not recall seeing general use of different groupings, or lengths.
Hmmh. As per https://en.wikipedia.org/wiki/Universally_unique_identifier section "Textual representations":
Because a UUID is a 128 bit label, it can be represented in different formats.
and offers some examples without much limitations. This is possibly why existing code supports other hyphen placements.
I think I would be open to allowing alternatives with different hyphen counts if (but only if) there is some documentation of
wide-spread used of such alternative(s).
There's a bug in UUID.fromString()
which (probably) will never be fixed because it has been this way for many years.
For example, the string "a-b-c-d-e" is interpreted by JDK's UUID as "0000000a-000b-000c-000d-00000000000e", but it's not a "canonical string".
There is a formal ABNF definition for UUID strings in RFC-4122:
The formal definition of the UUID string representation is
provided by the following ABNF [7]:
UUID = time-low "-" time-mid "-"
time-high-and-version "-"
clock-seq-and-reserved
clock-seq-low "-" node
time-low = 4hexOctet
time-mid = 2hexOctet
time-high-and-version = 2hexOctet
clock-seq-and-reserved = hexOctet
clock-seq-low = hexOctet
node = 6hexOctet
hexOctet = hexDigit hexDigit
hexDigit =
"0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
"a" / "b" / "c" / "d" / "e" / "f" /
"A" / "B" / "C" / "D" / "E" / "F"
ABNF translation:
- 1st group has 8 hex digits;
- 2nd group has 4 hex digits;
- 3rd group has 4 hex digits;
- 4th group has 4 hex digits;
- 5th group has 12 hex digits;
- It gives 36 chars (32 hex digits + 4 hyphens).
EDIT:
Just a word of advice for anyone reading this in the future: don't rely on UUID.fromString()
for validation. Use UUIDUtil or Regex.
How I miss the ABNF in the spec. Thanks for giving this an eye.
Thank you @fabiolimace for excellent summary and recommendations!