ngneat/falso

Non-uniform distribution of `randNumber` generation

Malien opened this issue · 2 comments

Malien commented

Is this a regression?

No

Description

Given the following snippet

import { randNumber } from "@ngneat/falso"

const freq = Array(10).fill(0)
for (const n of randNumber({ min: 0, max: 9, length: 10_000 }) {
  freq[n]++
}

we can see that the distribution is almost uniform (sample output; run-to-run variance is negligible):

[563, 1041, 1123, 1062, 1069, 1171, 1171, 1113, 1109, 578]

First and last entries are always encountered half as much. No matter, if the number is an integer, a float (via setting fraction parameter), and does not depend on the range size (min and max parameters), probabilities converge to
$$\left[\frac 1 {2 \cdot (S - 1) }, \frac 1 {S - 1}, \cdots, \frac 1 {S-1}, \frac 1 {2\cdot(S-1)} \right]$$
for a given amount of discrete values (S) with greater number of tests.

It is non-intuitive. This looks like a bug, in otherwise uniform-ish distribution.

Please provide a link to a minimal reproduction of the bug

No response

Please provide the exception or error you saw

No response

Please provide the environment you discovered this bug in

No response

Anything else?

Changing the algo will impact current "hard-seeded" outcomes, as such this might be a breaking change. I'm not saying that this is an important issue, just something I've noted. There are workarounds, such as using built-in (non-seedable) Math.random, using another lib, or using randNumber adding one more discrete value, and collapsing first and last outcomes into one.

Do you want to create a pull request?

No

@Malien telling from this line:
https://github.com/ngneat/falso/blob/main/packages/falso/src/lib/core/core.ts#L64C10-L64C66

This is caused by the algorithm itself.
If you run the same test but for the number coming from:

Number((Math.random() * (9 - 0) + 0).toFixed(0))

you get identical result. If we go further and use round instead of toFixed:

Number(Math.round((Math.random() * (3 - 1) + 1)))

We again get identical result.

So I think to solve that issue this particular line needs to be changed.

@Malien
Number(Math.floor((Math.random() * (10 - 0) + 0)) (floor and max + 1) produces proper uniform distribution.

This is compliant with MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random#getting_a_random_integer_between_two_values that tells us, that using Math.round() won't give you uniform distribution but Math.floor does.

This will fix generating for integer, however I'm not sure - yet - how to properly handle precision.