CCRandom initializes its own default state in a non reproducible way
nilsbecker opened this issue · 11 comments
ocaml-containers/src/core/CCRandom.ml
Line 207 in 447df82
here an internal __default_state
is created with a random seed, which is later used by default in the run
functions. this default state is different from the one in the included Stdlib.Random
module, namely Random.default
, afaict. i came across this because i encountered non-reproducible simulation runs.
i worked around it by always calling run ~st:(Random.get_default_state ())
which copies the default state from Stdlib.Random
, which i make sure to initialized in a reproducible way. i think this copying costs performance; a better way would be to create, initialize and later supply my own random state to give to the ~st
argument of run
.
anyway, i don't quite understand why CCRandom
does not reuse the default state in Stdlib.Random
? that way, Random.init
would make CCRandom.run
with default state reproducible, iiuc.
Hmm, I don't see a Random.default
in merlin nor in the stdlib manual, is it a recent addition?
I see your point about reproducibility, but on the other hand when I call random()
i don't necessarily expect it to be the same on each run of a program. I think both defaults have issues. I'm open to using a fixed seed, but I think it needs more discussion.
What do other programming languages do here? If they have a similar situation?
Looks like you should call Random.State.get_state()
for the default init, so that in case the user seeded the random state (to maintain reproducibility), you use the same state.
i think i didn't make my point very clearly, sorry about that. Random.default
is not exposed; you can find it in random.ml only. it is the state that gets used (and mutated) when calling Random
functions without referring to a specific state.
i didn't mean to suggest using a deterministic seed by default in CCRandom
. just calling random
should give a pseudo-random, non-deterministic value. however i think it's important to have a way to apply a deterministic seed to make simulations reproducible, if desired. this is possible in Stdlib
with Random.init
, for all random generators that use the default state at once. currently, with the way CCRandom
handles state, i did not see how that can be done. instead one has to specify a state as the ~st
argument to Random.run
.
@bluddy: that was my workaround: i explicitly called run ~st:(get_state ())
. but i think it's not ideal since get_state
copies an allocates needlessly. imo it would be better to take (the non-exposed) Random.default
directly, if possible. then Random.init
would also seed CCRandom.run
reproducibly.
@nilsbecker it's not possible because it's not exposed in the interface. It's also just one allocation and really not a huge deal IMO.
@bluddy ah. i thought maybe include
lets you include the raw module without filtering by signature but i guess that would break abstraction. also apparently it's not possible to blit from the default state with the provided interface of Random.State
, which would have enabled pre-allocating.
then indeed get_state ()
seems the only way to get at the state. in my stochastic simulations i call run
a lot and i do see a 15-20% performance hit from it in a specific example, which i can avoid by allocating my own state.
there is another unwanted effect of get_state
. copying the state means the original state is not modified by creation of a new random number. so calling Random.run ~st:Random.(get_state ()) rng
twice in a row will yield two times the same random number (!)
we could have a toplevel let default_state = Random.get_state()
which is called early on, and then use that one as a default: let run ?(state=default_state) g = …
i guess. however, Random.set_state state
would then not set that one. ah, maybe one could override it: in CCRandom:
let default_state = ref (Random.get_state ())
let set_state state = default_state := state; Stdlib.Random.set_state state
looks ugly but at least a reset would do what one would expect. it's a duplication of effort but i don't see how one could hook into the chain of states that Stdlib.Random
generates in its own rngs.
but what should (CC)Random.get_state ()
then return? the original one? the extra one? i think one wants the invariant that first calling Random.get_state ()
and then running a rng will yield the same result as let st = Random.get_state () in Random.set_state st;
and then running the rng. i think this is not so with the snippet above, since the CCRandom
default state will in general have been different from the Stdlib.Random
default state before calling set_state
. damn.
This seems confusing, because it depends on whether or not you open Containers;;
.
It seems like we have conflicting requirements here:
- be compatible with OCaml as a drop-in replacement
- be non-surprising (as in, calling
run g
several times should look random rather than always returning the same thing)
I'm not sure we can satisfy both. For the second one we could at least make the __default_state
deterministic using a fixed seed (which wouldn't be the same as Random
, but I think that's ok, one should not depend on that).
i would think that not having
* calling `run g` several times should look random rather than always returning the same thing
would be so surprising to qualify as a bug. there is also
- calling
run g
without initialization should give pseudo random results on successive runs of the program
which i think is nice to have and expected but not having it may be less of a problem?
[edit: deleted, i think this was not correct. the CCrandom generators mostly require an explicit state argument, so there is no problem. only run
seems to be problematic]
so a solution could be to be explicit that run
without ~st
updates its own private state, and to supply setter and getter functions set_run_state
and get_run_state
specifically to handle it?
I don't know what to think. set_run_state
and get_run_state
do seem like a reasonable idea, provided it's all well documented. This is all mostly a documentation issue as people can have as much control as they want already by just passing ~st
explicitly.