Wrong Algorithm for generating permutations.
jazzshop111 opened this issue · 3 comments
Please change the logic from:
for j := uint64(0); j < M; j++ {
p[j] = idx
idx += skip
if idx >= M {
idx -= M
}
}
To this one:
for j = 0; j < M; j++ {
p[j] = (offset + uint64(j)*skip) % M
}
You algorithum to generate permutations has mathematical Flaw. Please refer the original Google Maglev Hashing paper.
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf
The algorithm requires that p[j] is of size M, p[j] should not contain duplicates and should contain all numbers in range of 0 to M, which is only possible with following formula:
p[j] = (offset + uint64(j)*skip) % M
Your algorithm of generating permutations will contain duplicates in p[j] and will not have all numbers in range of 0 to M, this will create serious problems in "populate()" function.
I'm fairly certain these two algorithms are equivalent. I optimized the one from the paper to reduce the number of operations that happen during regeneration of the table. Do you have an example where there is a mismatch?
Quick back of the envelope evaluation: Consider the value of idx for each iteration of the loop as distinct values idx[0], idx[1], and so on. (We don't actually need the values, but it's convenient to think of them that way.)
Assume for the sake of argument that, for some j, idx[j] is equal to the computed value from the other expression. If idx[j] == (offset + uint64(j) * skip) % M
, then what about idx[j+1]
? Well, it will be equal to idx[j]+skip
, unless that exceeds M, in which case it will be equal to (idx[j]+skip) % M
.
What's p[j+1] with the old formula? Well, it's (offset + uint64(j+1)*skip) % M
, whereas p[j] is (offset + uint64(j)*skip) % M
. So p[j+1] is (p[j]+skip)%M
.
What that means is that if idx[j]
, computed with dgryski's method, is equal to p[j]
, computed with the multiplication expreession, then idx[j+1]
will be equal to p[j+1]
.
It is reasonably obvious that idx0 = offset, because idx is initialized with offset. Since offset is (some value) mod M, offset is in [0,M). Thus, idx0 = (offset + 0 * skip) % M
.
Therefore, by induction, idx[j] == p[j] for all j.