pingcap/go-ycsb

Faster Zipfian generator

leoholz opened this issue · 0 comments

Hi there,

for a research project I borrowed the Zipfian generator from go-ycsb (https://github.com/pingcap/go-ycsb/blob/master/pkg/generator/zipfian.go) and applied three patches to make it faster. Maybe you want to apply some of them to the project. You can see the differences here (among clutter from changing Go package paths): master...leoholz:master

  1. The check in line 158 recomputes the threshold for the first item on every call of Next(). Unfortunately, the Go compiler is not clever enough to recognize that this value never changes after initialization. You can about half the calculation time of Next() by replacing 1.0+math.Pow(0.5, z.theta) with a precalculated value.

  2. The performance of math.Pow() depends on the number of digits that the exponent has. Here the floating point math on line 110 is not helpful, e.g. for the default Zipfian constant 0.99 the computed alpha value is 99.99999999999991 (https://play.golang.org/p/P76sdDmLaZj). Therefore, one can speed up calculations by sensibly rounding the divisor: z.alpha = 1.0 / (math.Round((1.0-theta)*math.Pow10(10)) / math.Pow10(10)) (https://play.golang.org/p/JqWTjos6zru).

  3. The most time-consuming part of the initialization phase is the calculation of the harmonic number in zetaStatic(), this takes about 2 minutes for my test set of 1024*1024*1024 items. Luckily, the parts of the sum are independent of each other, so it is possible to use concurrency to speed things up. You can find an implementation here: https://github.com/leoholz/go-ycsb/blob/master/pkg/generator/zipfian.go#L135