c-cube/ocaml-containers

CCRandom initializes its own default state in a non reproducible way

nilsbecker opened this issue · 11 comments

let __default_state = Random.State.make_self_init ()

here an internal __default_state is created with a random seed, which is later used by default in the run functions. this default state is different from the one in the included Stdlib.Random module, namely Random.default, afaict. i came across this because i encountered non-reproducible simulation runs.

i worked around it by always calling run ~st:(Random.get_default_state ()) which copies the default state from Stdlib.Random, which i make sure to initialized in a reproducible way. i think this copying costs performance; a better way would be to create, initialize and later supply my own random state to give to the ~st argument of run.

anyway, i don't quite understand why CCRandom does not reuse the default state in Stdlib.Random? that way, Random.init would make CCRandom.run with default state reproducible, iiuc.

Hmm, I don't see a Random.default in merlin nor in the stdlib manual, is it a recent addition?

I see your point about reproducibility, but on the other hand when I call random() i don't necessarily expect it to be the same on each run of a program. I think both defaults have issues. I'm open to using a fixed seed, but I think it needs more discussion.

What do other programming languages do here? If they have a similar situation?

Looks like you should call Random.State.get_state() for the default init, so that in case the user seeded the random state (to maintain reproducibility), you use the same state.

i think i didn't make my point very clearly, sorry about that. Random.default is not exposed; you can find it in random.ml only. it is the state that gets used (and mutated) when calling Random functions without referring to a specific state.

i didn't mean to suggest using a deterministic seed by default in CCRandom. just calling random should give a pseudo-random, non-deterministic value. however i think it's important to have a way to apply a deterministic seed to make simulations reproducible, if desired. this is possible in Stdlib with Random.init, for all random generators that use the default state at once. currently, with the way CCRandom handles state, i did not see how that can be done. instead one has to specify a state as the ~st argument to Random.run.

@bluddy: that was my workaround: i explicitly called run ~st:(get_state ()). but i think it's not ideal since get_state copies an allocates needlessly. imo it would be better to take (the non-exposed) Random.default directly, if possible. then Random.init would also seed CCRandom.run reproducibly.

@nilsbecker it's not possible because it's not exposed in the interface. It's also just one allocation and really not a huge deal IMO.

@bluddy ah. i thought maybe include lets you include the raw module without filtering by signature but i guess that would break abstraction. also apparently it's not possible to blit from the default state with the provided interface of Random.State, which would have enabled pre-allocating.

then indeed get_state () seems the only way to get at the state. in my stochastic simulations i call run a lot and i do see a 15-20% performance hit from it in a specific example, which i can avoid by allocating my own state.

there is another unwanted effect of get_state. copying the state means the original state is not modified by creation of a new random number. so calling Random.run ~st:Random.(get_state ()) rng twice in a row will yield two times the same random number (!)

we could have a toplevel let default_state = Random.get_state() which is called early on, and then use that one as a default: let run ?(state=default_state) g = …

i guess. however, Random.set_state state would then not set that one. ah, maybe one could override it: in CCRandom:

let default_state = ref (Random.get_state ())
let set_state state = default_state := state; Stdlib.Random.set_state state

looks ugly but at least a reset would do what one would expect. it's a duplication of effort but i don't see how one could hook into the chain of states that Stdlib.Random generates in its own rngs.

but what should (CC)Random.get_state () then return? the original one? the extra one? i think one wants the invariant that first calling Random.get_state () and then running a rng will yield the same result as let st = Random.get_state () in Random.set_state st; and then running the rng. i think this is not so with the snippet above, since the CCRandom default state will in general have been different from the Stdlib.Random default state before calling set_state. damn.

This seems confusing, because it depends on whether or not you open Containers;;.

It seems like we have conflicting requirements here:

  • be compatible with OCaml as a drop-in replacement
  • be non-surprising (as in, calling run g several times should look random rather than always returning the same thing)

I'm not sure we can satisfy both. For the second one we could at least make the __default_state deterministic using a fixed seed (which wouldn't be the same as Random, but I think that's ok, one should not depend on that).

i would think that not having

* calling `run g` several times should look random rather than always returning the same thing

would be so surprising to qualify as a bug. there is also

  • calling run g without initialization should give pseudo random results on successive runs of the program

which i think is nice to have and expected but not having it may be less of a problem?

[edit: deleted, i think this was not correct. the CCrandom generators mostly require an explicit state argument, so there is no problem. only run seems to be problematic]

so a solution could be to be explicit that run without ~st updates its own private state, and to supply setter and getter functions set_run_state and get_run_state specifically to handle it?

I don't know what to think. set_run_state and get_run_state do seem like a reasonable idea, provided it's all well documented. This is all mostly a documentation issue as people can have as much control as they want already by just passing ~st explicitly.