nick8325/quickcheck

[Feature request] Fair shuffling

Javran opened this issue · 2 comments

A shuffle generator is introduced by #44, however sort-based shuffling has some drawbacks as described by http://okmij.org/ftp/Haskell/perfect-shuffle.txt (please text search for section "A sort-based algorithm and its critique").

I propose that either (1) we replace shuffle with a fair shuffle (for simplicity of the implemenation, "An imperative implementation: swapping" in the same article above with inside ST should be good enough), or (2) introduce another function trueShuffle for pedantic minds like myself.

I want to see which one would you prefer and also volunteer myself to work this :)

The critique doesn't apply. (emphasis mine)

Let us consider the simplest example (which happens to be the worst
case): a sequence of two elements, [a b]. According to the
shuffle-by-random-keys algorithm, we pick two binary random numbers,

The Test.QuickCheck.shuffle however picks the arbitrary full-range Ints. Int range (264) is huge in comparison to the length of lists you will shuffle. Generation of the same tags is unluckily (the list have to be quite long for birthday paradox to be noticeable). The shuffling is still not perfect, but I doubt you can observe the bias.

we can be faster by directly choosing Word64.

That's fair given the huge range. Another argument that I forgot to mention is that an imperative swapping takes O(n) time while sorting takes O(n log n), but again this isn't very significant unless the list is very large.

I honestly don't feel strongly that this is something that needs fixing - just openning an issue and see what are your thoughts on this.