/xoroshiro128

Clojure implementation of the xoroshiro128+ PRNG described at http://xoroshiro.di.unimi.it/

Primary LanguageClojureEclipse Public License 1.0EPL-1.0

xoroshiro128+

This is a Clojure(Script) implementation of the xoroshiro128+ PRNG described at http://xoroshiro.di.unimi.it/

The algorithm has been shown to be fast and produce superior statistical results to many PRNGs shipped with languages, including Java. The statistical results have been verified in both PractRand and TestU01 by the authors.

xoroshiro128+ is designed to be the successor to xorshift128+, currently used in the JavaScript engines of Chrome, Firefox and Safari.

Both xorshift128+ and xoroshiro128+ have a period of 2128 but xoroshiro128+ is benchmarked by the authors as 20% faster and with 20% fewer failures in BigCrush than its predecessor.

Installation

Clojars Project

CircleCI

Usage

Everything is in xoroshiro128.core.

(require '[xoroshiro128.core])

Simple

The simplest usage is to simply call rand to generate a random long.

This uses an atom to store the state of the prng, which is updated to the next value in sequence on each call to rand.

rand is automatically seeded by java.util.Random or Math.random, passed through splitmix64, and so should Just Work.

This atom can be manually reset/seeded (e.g. during testing) by calling seed-rand! and passing it a long. Internally, as per the suggestion in the reference implementation, this long will be passed through a splitmix64 generator to generate the two 64 bit seeds needed for the xoroshiro128+ algorithm. seed-rand! also accepts two arguments, which should both be longs to pass directly to xoroshiro128+ (no splitmix64 involved).

Parallel calls to rand

As of version 1.1.1, calls to rand in parallel will use compare-and-set! to ensure that each value is returned only once across all threads. No guarantees about the ordering of rand are made for parallel execution.

For example, using pmap and map to produce two sequences of random numbers with rand, starting with the same seed, would:

  • create the same set of longs twice
  • not order the sequences in the same way, pmap will have incorrect ordering
  • not duplicate any values in pmap that aren't duplicated in map

See xoroshiro128.test.state for examples.

Advanced

Both xoroshiro128+ and splitmix64 are implemented as immutable types with a shared protocol. The IPRNG protocol exposes next and value to generate the next item in the sequence or the long representing the current item, respectively.

This protocol allows us to construct a lazy sequence from a seed quite easily with iterate and map.

(require '[xoroshiro128.core :as x])
(def seed 12345)
(def my-rand-seq (map x/value (iterate x/next (x/xoroshiro128+ seed))))
(take 5 my-rand-seq)
; (5983371452340661002 -139170902943549277 2600980751997770790 3131701164191746090 -3375623441569470803)

Additionally, we can inspect any item in the sequence to extract the seed, allowing us to resume the sequence from that point later.

(require '[xoroshiro128.core :as x])
(def seed 9876)
(def my-rand-item (-> (x/xoroshiro128+ seed) x/next x/next x/next))

; Peek ahead at my-rand-item + 1
(x/value (x/next my-rand-item))
; 3486300715335445982

; Extract the seed from my-rand-item for later.
(def new-seed (x/seed my-rand-item));
; [-5785456751514194665 7961309068892779353]

; Create a new item from our extracted seed.
(def new-rand-item (apply x/xoroshiro128+ new-seed))
; If we seeded correctly, new-rand-item + 1 should be the same as my-rand-item + 1.
(x/value (x/next new-rand-item))
; 3486300715335445982

Seeds

Seeds for the xoroshiro128+ algorithm must be 128 bit (as the name implies).

As clojure on the JVM supports 64 bit integers (longs) but not 128 bit integers, the seed is represented internally as two longs in a vector.

The situation is slightly more complex in clojurescript as JavaScript does not provide native support for 64 bit integers, only 64 bit floats. The google.math.Long class from Google Closure is used to provide 64 bit integer support, and then 128 bit seed support using two longs as per the clojure implementation.

As mentioned above, if only a single long is available to seed the PRNG, the splitmix algorithm can be used to extrapolate further longs to use as a seed. For convenience long->seed128 is provided convert a 64 bit seed into a 128 bit seed. This function is used internally when only a single long is provided.

The authors of Xoroshiro128+ recommend against seeding the PRNG with the output of a similar PRNG.

"research has shown that initialization must be performed with a generator radically different in nature from the one initialized to avoid correlation on similar seeds."

They also mention that similar PRNGs may be used internally by Math.random in JavaScript, depending on the browser brand and version (this could change at any time):

Albeit superseded by xoroshiro128+, xorshift128+ is presently used in the JavaScript engines of Chrome, Firefox, Safari, Microsoft Edge.

For this reason I recommend against providing the full 128 bits directly from the native PRNG in a web browser - pass 64 bits (1 number from Math.random) through long->seed128 instead (it uses splitmix64 internally to fix this problem).

It is worth noting that converting a 64 bit seed to a 128 bit seed using long->seed128 is a deterministic process, i.e. any given long always provides the same seed. This means the pool of available seeds is 64 bits, and is not magically increased to 128 bits. Whether this matters or not is entirely contextual, but providing 128 bit seeds will drastically increase the size of the pool of available pseudo random sequences to draw upon.

As UUIDs represent 128 bit integers (hexadecimal), they are also supported for convenience as seed values both via. the xoroshiro128+ function and uuid->seed128. Note that UUID generation functions leave a few bits of the UUID static to indicate the UUID version and variant. This means the seed pool for a given UUID generation function is both orders of magnitude larger than a single 64 bit seed, and smaller than two independant 64 bit seeds.

The current seed value can be extracted as a 128 bit seed vector from both Xoroshiro128+ and Splitmix64 type data with the seed function.

Jump function

The Xoroshiro128+ algorithm supports a jump function to easily create new non-overlapping sequences from any starting point.

Calling the jump function is equivalent to calling next 264 times.

As the total period of Xoroshiro128+ is 2128 it is possible to call jump 264 times before needing to reseed a totally new sequence.

If you are using simple calls to rand there is a jump-rand! function provided to jump the state of rand.

(require '[xoroshiro128.core :as x])
(def seed 55555)
(def my-rand (x/xoroshiro128+ seed))
; jumped-rand is equivalent to 2^64 calls to (x/next my-rand)
(def jumped-rand (x/jump my-rand))

Correctness

The outputs of Splitmix64 next value, Xoroshiro128+ next value, and the Xoroshiro128+ jump function have all been verified against samples from the reference C implementation.

Dozens of values were generated from https://ideone.com/PuauK5 and fed directly into the expected outputs for the test suite.

Converting to doubles in the unit interval

It is a common requirement to convert our random 64 bit integers into a random decimal number between 0 and 1. For example, this is a convenient way to insulate ourselves from goog.math.Long in cljs as the output of such a conversion will be a native JS float.

Unfortunately the naive approach of simply dividing a generated long by the the maximum long produces a biased, non-uniform distribution.

The author of Xoroshiro128+ and Splitmix64 suggests two options for correctly converting from generated 64 bit integers to a 64 bit float between [0, 1).

Their primary recommendation is available for both clj and cljs as xoroshiro128.core/long->unit-float.

I recommend using this function if you don't have any specific reason to prefer the alternative.

The alternative recommendation is available for clj only as xoroshiro128.core/long->unit-float:alt.

Referencing the alternative algorithm, the authors state:

"This technique is extremely fast, but you will be generating half the values you could actually generate."

As far as I'm aware, neither JavaScript nor Google Closure provide a native/fast way to perform the operations required by the alterative algorithm. Even if this can be implemented somehow in JS, I'm dubious whether any speed advantage over the standard algorithm would survive the implementation details (e.g. juggling strings internally). Without this speed advantage the alternative algorithm is clearly inferior, so in my opinion the implementation effort is simply not justified for JavaScript.

Performance

All benchmarking code is available under xoroshiro128.test.performance.

Clojure

I did some basic benchmarking on my laptop using criterium and found ~66% speed improvement using xoroshiro128+ vs. the default Java PRNG.

As always with benchmarking, YMMV.

I compared (.nextLong (java.util.Random.)) against (xoroshiro128.core/rand).

Results from java.util.Random:

Evaluation count : 831204360 in 60 samples of 13853406 calls.
             Execution time mean : 70.719127 ns
    Execution time std-deviation : 0.615334 ns
   Execution time lower quantile : 69.942200 ns ( 2.5%)
   Execution time upper quantile : 72.194872 ns (97.5%)
                   Overhead used : 1.675261 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Results from xoroshiro128.core/rand:

Evaluation count : 2301527460 in 60 samples of 38358791 calls.
             Execution time mean : 24.654548 ns
    Execution time std-deviation : 0.398656 ns
   Execution time lower quantile : 24.329306 ns ( 2.5%)
   Execution time upper quantile : 25.960526 ns (97.5%)
                   Overhead used : 1.687464 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 2 (3.3333 %)
	low-mild	 3 (5.0000 %)
 Variance from outliers : 2.0241 % Variance is slightly inflated by outliers

Results from (comp xoroshiro128.xoroshiro128/long->unit-float xoroshiro128.state/rand)

Evaluation count : 1925694840 in 60 samples of 32094914 calls.
             Execution time mean : 29.564911 ns
    Execution time std-deviation : 0.185632 ns
   Execution time lower quantile : 29.263758 ns ( 2.5%)
   Execution time upper quantile : 29.912975 ns (97.5%)
                   Overhead used : 1.684063 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Given these results I think it's safe to recommend xoroshiro128+ as a mostly "drop in" replacement for (.nextLong (java.util.Random.)).

ClojureScript

Unsurprisingly the performance of goog.math.Long is significantly worse across the board than native JavaScript floats.

I'm not aware of any benchmarking tool for CLJS that is as sophisticated as criterium, so I simply ran each function 1 million times and recorded the timestamps with performance.now.

Results:

LOG: '"benchmarking xoroshiro128.state/rand"'
LOG: '198.56'
LOG: '"benchmarking xoroshiro128.state/rand as xoroshiro128.xoroshiro128/long->unit-float"'
LOG: '3715.5350000000003'
LOG: '"benchmarking xoroshiro128.state/native-rand"'
LOG: '9810.575'
LOG: '"benchmarking Math.random"'
LOG: '20.904999999998836'

These numbers seem to wobble by around +/- 20% on subsequent runs.

We can see that xoroshiro128+ is ~10x slower than Math.random but it's a bit "apples vs. oranges" as Math.random produces floats between [0, 1) and xoroshiro128+ produces goog.math.Long objects across the full 64 bit integer range.

We see ~200ns per call (0.29s for 1 000 000 calls) in CLJS vs. ~25ns per call in CLJ. This puts the JVM at around 10x faster than JS for this particular use-case. This is also another "apples vs. oranges" comparison as the JVM and JS runtime environments are obviously very different.

To get "apples to apples" timings within JS (and to seed rand) I created a "native random long" function that works exactly like the internals of random-uuid, but stops halfway to return a single goog.math.Long instead of a full UUID:

; lifted from https://cljs.github.io/api/cljs.core/random-uuid
(let [hex #(.toString (rand-int 16) 16)]
 (goog.math.Long.fromString
  (apply str (take 16 (repeatedly hex)))
  16))

This approach is ~50x slower than xoroshiro128+ and ~470x slower than Math.random (due to the string manipulation, I assume).

We can also compare the speed of float generation by passing the output of rand calls to long->unit-float against Math.random. In this case, xoroshiro128+ generates floats ~180 times slower than Math.random calls.

Overall the use-cases for xoroshiro128+ in CLJS are not as clear cut as CLJ due to lack of native long support.

I recommend xoroshiro128+ when:

  • Seeding the PRNG is important
  • Working with the full range of 64 bit integers (goog.math.Long) is acceptible or even important
  • Wanting to work against the xoroshiro128.prng/IPRNG protocol
  • Compatibility with another system implementing xoroshiro128+ is important

I recommend Math.random when working with an unseeded PRNG with an undefined algorithm outputting only floats on [0, 1) is acceptible.

CLJS optimizations & environment

CLJS benchmarks were conducted on Chrome with the :advanced compiler optimization flag as this should best represent usage in production deployments. Interestingly, advanced compilation made Math.random calls about 6x slower, and goog.math.Long based logic ~30-60% faster.

Fair warning that changing the browser and CLJS compilation optimisations level drastically changes the absolute and relative benchmark timings - in some cases by 100% or more.

Cryptography

xoroshiro128+ and the family of related generators are not cryptographically secure or intended for use in cryptography.

These generators are designed to produce a seedable, statistically uniform distribution at high speeds, with a reasonable period.

License

The xoroshiro128+ algorithm reference implementation in C was developed by David Blackman and Sebastiano Vigna in 2016 under a Creative Commons public domain license https://creativecommons.org/publicdomain/zero/1.0/.

This clojure implementation is copyright © 2016 David Meister.

Distributed under the Eclipse Public License version 1.0.

Thanks

Thanks to mscharley for helping to verify the outputs of the test suite.