Generated ulids should be unambigious
muxmuse opened this issue · 4 comments
The encoded form of the ulid carries 2bits more than the binary form. In order to have comparable string representations, only the following values can be used for the right most (least significant) symbol
0,4,8,C,G,M,R,W
This library generates other values. I would not consider it a bug, but a potential improvement.
Can you write up a quick demonstration program, to illustrate what you mean?
The following string representations of ulids will decode to the same all-zero byte representation
"00000000000000000000000000"
"00000000000000000000000001"
"00000000000000000000000002"
"00000000000000000000000003"
You can use any valid base32crockford decoding algorithm to check this, e.g.: https://www.dcode.fr/crockford-base-32-encoding
That means that all those strings represent the same ulid, although they are not equal. Imo, when generating ulids in string representation, the last 2 bits should always be set to zero to avoid this confusion.
The following string representations of ulids will decode to the same all-zero byte representation
Am I doing this wrong?
package main
import (
"fmt"
"github.com/oklog/ulid/v2"
)
func main() {
for _, s := range []string{
"00000000000000000000000000",
"00000000000000000000000001",
"00000000000000000000000002",
"00000000000000000000000003",
} {
u, err := ulid.Parse(s)
fmt.Printf("ulid.Parse(%q) -> %s (%0X) err=%v\n", s, u, u, err)
}
}
has output
ulid.Parse("00000000000000000000000000") -> 00000000000000000000000000 (3030303030303030303030303030303030303030303030303030) err=<nil>
ulid.Parse("00000000000000000000000001") -> 00000000000000000000000001 (3030303030303030303030303030303030303030303030303031) err=<nil>
ulid.Parse("00000000000000000000000002") -> 00000000000000000000000002 (3030303030303030303030303030303030303030303030303032) err=<nil>
ulid.Parse("00000000000000000000000003") -> 00000000000000000000000003 (3030303030303030303030303030303030303030303030303033) err=<nil>
Or do you mean to say these strings should parse as the same value?
Hello @peterbourgon, thank you for taking the time. I think that the implementation is correct, and I will close this issue.
RFC 4648 defines base32 encoding with padding at the end of the input. ULIDs, instead, are specified to start with symbols <= '7'
, effectively padding from the left. My wrong assumption was, that the encoded ULID was defined as base32crockford(128bit ulid)
.