Faster Zipfian generator
leoholz opened this issue · 0 comments
Hi there,
for a research project I borrowed the Zipfian generator from go-ycsb (https://github.com/pingcap/go-ycsb/blob/master/pkg/generator/zipfian.go) and applied three patches to make it faster. Maybe you want to apply some of them to the project. You can see the differences here (among clutter from changing Go package paths): master...leoholz:master
-
The check in line 158 recomputes the threshold for the first item on every call of Next(). Unfortunately, the Go compiler is not clever enough to recognize that this value never changes after initialization. You can about half the calculation time of Next() by replacing
1.0+math.Pow(0.5, z.theta)
with a precalculated value. -
The performance of
math.Pow()
depends on the number of digits that the exponent has. Here the floating point math on line 110 is not helpful, e.g. for the default Zipfian constant 0.99 the computed alpha value is 99.99999999999991 (https://play.golang.org/p/P76sdDmLaZj). Therefore, one can speed up calculations by sensibly rounding the divisor:z.alpha = 1.0 / (math.Round((1.0-theta)*math.Pow10(10)) / math.Pow10(10))
(https://play.golang.org/p/JqWTjos6zru). -
The most time-consuming part of the initialization phase is the calculation of the harmonic number in zetaStatic(), this takes about 2 minutes for my test set of
1024*1024*1024
items. Luckily, the parts of the sum are independent of each other, so it is possible to use concurrency to speed things up. You can find an implementation here: https://github.com/leoholz/go-ycsb/blob/master/pkg/generator/zipfian.go#L135