[Feature request] Fair shuffling
Javran opened this issue · 2 comments
A shuffle generator is introduced by #44, however sort-based shuffling has some drawbacks as described by http://okmij.org/ftp/Haskell/perfect-shuffle.txt (please text search for section "A sort-based algorithm and its critique").
I propose that either (1) we replace shuffle
with a fair shuffle (for simplicity of the implemenation, "An imperative implementation: swapping" in the same article above with inside ST should be good enough), or (2) introduce another function trueShuffle
for pedantic minds like myself.
I want to see which one would you prefer and also volunteer myself to work this :)
The critique doesn't apply. (emphasis mine)
Let us consider the simplest example (which happens to be the worst
case): a sequence of two elements, [a b]. According to the
shuffle-by-random-keys algorithm, we pick two binary random numbers,
The Test.QuickCheck.shuffle
however picks the arbitrary full-range Int
s†. Int
range (264) is huge in comparison to the length of lists you will shuffle
. Generation of the same tags is unluckily (the list have to be quite long for birthday paradox to be noticeable). The shuffling is still not perfect, but I doubt you can observe the bias.
† we can be faster by directly choosing Word64
.
That's fair given the huge range. Another argument that I forgot to mention is that an imperative swapping takes O(n) time while sorting takes O(n log n), but again this isn't very significant unless the list is very large.
I honestly don't feel strongly that this is something that needs fixing - just openning an issue and see what are your thoughts on this.