dgryski/go-maglev

Wrong Algorithm for generating permutations.

jazzshop111 opened this issue · 3 comments

Please change the logic from:

for j := uint64(0); j < M; j++ {
p[j] = idx
idx += skip
if idx >= M {
idx -= M
}
}

To this one:

for j = 0; j < M; j++ {
p[j] = (offset + uint64(j)*skip) % M
}

You algorithum to generate permutations has mathematical Flaw. Please refer the original Google Maglev Hashing paper.
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf

The algorithm requires that p[j] is of size M, p[j] should not contain duplicates and should contain all numbers in range of 0 to M, which is only possible with following formula:
p[j] = (offset + uint64(j)*skip) % M

Your algorithm of generating permutations will contain duplicates in p[j] and will not have all numbers in range of 0 to M, this will create serious problems in "populate()" function.

I'm fairly certain these two algorithms are equivalent. I optimized the one from the paper to reduce the number of operations that happen during regeneration of the table. Do you have an example where there is a mismatch?

seebs commented

Quick back of the envelope evaluation: Consider the value of idx for each iteration of the loop as distinct values idx[0], idx[1], and so on. (We don't actually need the values, but it's convenient to think of them that way.)

Assume for the sake of argument that, for some j, idx[j] is equal to the computed value from the other expression. If idx[j] == (offset + uint64(j) * skip) % M, then what about idx[j+1]? Well, it will be equal to idx[j]+skip, unless that exceeds M, in which case it will be equal to (idx[j]+skip) % M.

What's p[j+1] with the old formula? Well, it's (offset + uint64(j+1)*skip) % M, whereas p[j] is (offset + uint64(j)*skip) % M. So p[j+1] is (p[j]+skip)%M.

What that means is that if idx[j], computed with dgryski's method, is equal to p[j], computed with the multiplication expreession, then idx[j+1] will be equal to p[j+1].

It is reasonably obvious that idx0 = offset, because idx is initialized with offset. Since offset is (some value) mod M, offset is in [0,M). Thus, idx0 = (offset + 0 * skip) % M.

Therefore, by induction, idx[j] == p[j] for all j.