typelevel/scalacheck

Fragile test of SeedSpecification

ashawley opened this issue · 3 comments

I saw a curious failure on GitHub Actions that I hadn't seen before:

[info] failing seed for Seed.longs are evenly-distributed is -aQhCwV8oBUYlWwee46hAk7cDCysDENMKQKEw-FnwEC=
[info] ! Seed.longs are evenly-distributed: Falsified after 368 passed tests.
[info] > ARG_0: Seed.fromBase64("mngJYod-N9mPAhd861EV4YJQcckrt-vaCc8Eabb013H=")
[info] > ARG_1: Base(53)
[error] Failed: Total 8, Failed 1, Errors 0, Passed 7
[error] Failed tests:
[error]  org.scalacheck.rng.SeedSpecification

Draft PR #711 shows that the seed fails.

I believe this is a probabilistic property test, so it's probably acceptable to fail rarely.

non commented

I re-read the test. I think the predicate has a random 1-in-1.7M chance of failing each time we run it, but since we evaluate the forAll 100 times, we really have a 1-in-1700 chance of failing each time we run SeedSpecification. Since we tend to run CI over a lot of different targets this means we have more like a 1-in-170 (or so) chance of failing.

We can reduce the chances of false positives at the expense of test accuracy: right now we expect the random value we produce to be within 5 standard deviations of the expected result (https://en.wikipedia.org/wiki/Standard_deviation#Rules_for_normally_distributed_data).

If we used 6 standard deviations instead of 5, we'd expect the test to randomly fail only 1-in-506.8M (about 300x less often). We could also do two separate tests and require both to fail: requiring two consecutive failures at 4 standard deviations has a 1-in-249M chance of failing (1-in-15,787 x 1-in-15787).

Do you think we should change it? I'm fine closing your PR and this issue, or making that change (which could obscure real bugs but will also cut down on noise).

I was just wondering if your probability test had a flaw or if the the issue was with the random number generator. A failure did seem to be super rare over the last 4 years as you described. I'm fine with closing this.