axiomhq/hyperloglog

document requirements for InsertHash

RaduBerinde opened this issue · 2 comments

It would be good to provide some documentation about what properties are expected of the input hash for InsertHash. For example, a reasonable question is: if my data is a set of 64-bit ints, can I use them directly as the "hash"? I ran some experiments with that and got bad results. I also got bad results when I used a simplistic hash function (xor and multiply by large prime). Does it require good avalanche characteristics?

the hash should uniform across the 64 bit spectrum if you limit it to a range of [0-n] you are guaranteeing that some registers will never be set... I will write some examples and put that into the documentation. WDYT @RaduBerinde

Sounds great, thanks! Yeah, after typing up this issue I looked more into HLL and realized it assumes uniformity. I think it's an important point because some hash functions focus only on low collisions.