oklog/ulid

Generated ulids should be unambigious

muxmuse opened this issue · 4 comments

The encoded form of the ulid carries 2bits more than the binary form. In order to have comparable string representations, only the following values can be used for the right most (least significant) symbol
0,4,8,C,G,M,R,W

This library generates other values. I would not consider it a bug, but a potential improvement.

Can you write up a quick demonstration program, to illustrate what you mean?

The following string representations of ulids will decode to the same all-zero byte representation

"00000000000000000000000000"
"00000000000000000000000001"
"00000000000000000000000002"
"00000000000000000000000003"

You can use any valid base32crockford decoding algorithm to check this, e.g.: https://www.dcode.fr/crockford-base-32-encoding

That means that all those strings represent the same ulid, although they are not equal. Imo, when generating ulids in string representation, the last 2 bits should always be set to zero to avoid this confusion.

The following string representations of ulids will decode to the same all-zero byte representation

Am I doing this wrong?

package main

import (
	"fmt"

	"github.com/oklog/ulid/v2"
)

func main() {
	for _, s := range []string{
		"00000000000000000000000000",
		"00000000000000000000000001",
		"00000000000000000000000002",
		"00000000000000000000000003",
	} {
		u, err := ulid.Parse(s)
		fmt.Printf("ulid.Parse(%q) -> %s (%0X) err=%v\n", s, u, u, err)
	}
}

has output

ulid.Parse("00000000000000000000000000") -> 00000000000000000000000000 (3030303030303030303030303030303030303030303030303030) err=<nil>
ulid.Parse("00000000000000000000000001") -> 00000000000000000000000001 (3030303030303030303030303030303030303030303030303031) err=<nil>
ulid.Parse("00000000000000000000000002") -> 00000000000000000000000002 (3030303030303030303030303030303030303030303030303032) err=<nil>
ulid.Parse("00000000000000000000000003") -> 00000000000000000000000003 (3030303030303030303030303030303030303030303030303033) err=<nil>

Or do you mean to say these strings should parse as the same value?

Hello @peterbourgon, thank you for taking the time. I think that the implementation is correct, and I will close this issue.

RFC 4648 defines base32 encoding with padding at the end of the input. ULIDs, instead, are specified to start with symbols <= '7', effectively padding from the left. My wrong assumption was, that the encoded ULID was defined as base32crockford(128bit ulid).