Optimised random sampling methods
mratsim opened this issue · 1 comments
Random sampling methods for common distributions are required.
Normal distribution:
Currently Arraymancer uses Box-Muller: https://github.com/mratsim/Arraymancer/blob/d05ef61847601fe253837329e0c8c47be18e8d9d/src/tensor/init_cpu.nim#L209-L235
proc randomNormal(mean = 0.0, std = 1.0): float =
## Random number in the normal distribution using Box-Muller method
## See https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
var valid {.global.} = false
var x {.global.}, y {.global.}, rho {.global.}: float
if not valid:
x = rand(1.0)
y = rand(1.0)
rho = sqrt(-2.0 * ln(1.0 - y))
valid = true
return rho*cos(2.0*PI*x)*std+mean
else:
valid = false
return rho*sin(2.0*PI*x)*std+mean
The polar method and Ziggurat algorithm are apparently faster.
Reading
I can give some feedback here. 8 years ago or so, I timed C versions of Ziggurat as much faster than Box-Muller for normal deviates on Intel Nehalem..About 25 clock cycles. Marsaglia's original paper gets about 400 Mhz/15 Mps = 27 cycles. I haven't timed it lately, but "around 20-30" cycles seems pretty likely. That's for float32 precision unit normals and when you can keep 3 different 512B tables in L1. Often one generates a big batch of numbers and then uses them, though. So, cache competition is probably not hard to address in practice.