impy-project/chromo

Replace RMMARD with a modern PRNG

HDembinski opened this issue · 3 comments

Is there are reason why we want to use RMMARD for all models, apart from "CORSIKA uses it"? RMMARD is a bad generator.

  • It has a complex state and the implementation is also way more complex than necessary, which means it is very likely slow. The latest generation of PRNGs are extremely simple (state is an array of 2 to 4 int64), very fast (typically only a handful lines of code) and some can exploit SIMD instructions to make them even faster.
  • It is also limited because it returns always double precision numbers. A PRNG should return integers which are then converted into other numbers. Fundamentally, a PRNG generates random bits.
  • It is unknown what its statistical properties are. Lots of PRNGs that used to be state-of-the-art have been dethroned, like the Mersenne-Twister.and they are known to pass the hardest statistical tests.
  • It only allows to use one integer for seeding, which only allows 2**32 different streams of random numbers. A proper PRNG allows one to seed the full state of the generator, to access the full number of independent streams which are allowed by the generator, typically 2 ** 128 or more in modern PRNGs.

Modern generators considered good are xoshiro256++ (see https://prng.di.unimi.it which also has a nice comparison of modern PRNGs) or the PCG64 generator implemented in Numpy
https://github.com/numpy/numpy/blob/main/numpy/random/_pcg64.pyx
https://github.com/numpy/numpy/blob/main/numpy/random/src/pcg64/pcg64.c

Using the latter has many advantages.

  • We don't need to implement and maintain and test the generator ourselves.
  • Loading/saving state in Python is already supported.
  • We can use multinomial distribution from numpy to generate nuclei from the CompositeTarget without using a second random number generator.

We just need a Fortran wrapper for the C implementation, which seems easy to write:
https://engineering.purdue.edu/ECN/Support/KB/Docs/CallingCFromFortran

It would be good, if there will be only one source of random numbers. Especially good, that it is a modern and standard. I wonder about technical moments: how it will be possible to use the same instance in fortran code and in numpy at the same time. Should the fortran wrapper be somehow injected into numpy code?

how it will be possible to use the same instance in fortran code and in numpy at the same time.

We have to see, but I don't see a fundamental obstacle. The state of the generator is a few integers. We either copy the state around manually to ensure that the generators in Fortran and Python are in sync, or we make it so that Fortran uses the Numpy state. The latter seems possible, since the Fortran code is supposed to call the Numpy implementation (which itself is C code) via some wrapper functions. The state is then not duplicated in Fortran.

All we need from Numpy is a pointer its PCG state in memory. We should be able to get that pointer from Numpy. The pointer can be used by the Fortran interface. In Fortran, all arguments to functions are passed as pointers anyway.

I didn't have time to implement this, but I really wanted to see whether it works, and I managed to puzzle a basic demo together. I am fairly proud that I got this working, considering that I have no clue how the Fortran syntax works.

Edit I improve the example so that it uses rangen.fpp. The new version can use any PRNG implemented in numpy, not only PCG64. See
https://github.com/HDembinski/essays/tree/master/numpy_rng_fortran

To build and test the example, you need meson, numpy, and pybind11.

pip install meson, pybind11, numpy
meson setup build
cd build
ninja
python -c 'import rangen, numpy as np; r = np.random.default_rng(1); rangen.init_fortran(r); print(rangen.call_fortran())'