purescript/purescript-lists

sort and sortBy are not stack safe

milesfrain opened this issue · 4 comments

sort $ range 1 1000000
/home/miles/projects/purescript/lists/output/Data.List/index.js:287
                            return as(new Data_List_Types.Cons(a, ys));
                                      ^
RangeError: Maximum call stack size exceeded
    at $tco_var_as (/home/miles/projects/purescript/lists/output/Data.List/index.js:287:39)

Should we swap out this current version (which I assume depends on Haskell's laziness) for something that just reuses Array's sortBy? I think that might end up being faster even in situations where we're not having stack issues.

-- | Sort the elements of a list in increasing order, where elements are
-- | compared using the specified ordering.
sortBy :: forall a. (a -> a -> Ordering) -> List a -> List a
sortBy cmp = mergeAll <<< sequences
-- implementation lifted from http://hackage.haskell.org/package/base-4.8.0.0/docs/src/Data-OldList.html#sort
where
sequences :: List a -> List (List a)
sequences (a : b : xs)
| a `cmp` b == GT = descending b (singleton a) xs
| otherwise = ascending b (a : _) xs
sequences xs = singleton xs
descending :: a -> List a -> List a -> List (List a)
descending a as (b : bs)
| a `cmp` b == GT = descending b (a : as) bs
descending a as bs = (a : as) : sequences bs
ascending :: a -> (List a -> List a) -> List a -> List (List a)
ascending a as (b : bs)
| a `cmp` b /= GT = ascending b (\ys -> as (a : ys)) bs
ascending a as bs = ((as $ singleton a) : sequences bs)
mergeAll :: List (List a) -> List a
mergeAll (x : Nil) = x
mergeAll xs = mergeAll (mergePairs xs)
mergePairs :: List (List a) -> List (List a)
mergePairs (a : b : xs) = merge a b : mergePairs xs
mergePairs xs = xs
merge :: List a -> List a -> List a
merge as@(a : as') bs@(b : bs')
| a `cmp` b == GT = b : merge as bs'
| otherwise = a : merge as' bs
merge Nil bs = bs
merge as Nil = as

I'm thinking something like this would work:

sortBy :: forall a. (a -> a -> Ordering) -> List a -> List a
sortBy cmp = fromFoldable <<< Array.sortBy cmp <<< Array.fromFoldable

But it's very slow. It should be much faster if we change Array.fromFoldable to use foldl instead. Here's the existing version:

fromFoldable :: forall f. Foldable f => f ~> Array
fromFoldable = fromFoldableImpl foldr

https://github.com/purescript/purescript-arrays/blob/463dcacb99455de5e28ac4a3312bb795f293d2d9/src/Data/Array.purs#L154-L155


Edit: Another option, but still slower than the original list version (up to 10k elements, then list version overflows).

sortBy :: forall a. (a -> a -> Ordering) -> List a -> List a
sortBy cmp = fromFoldable <<< Array.sortBy cmp <<< toUnfoldable

Have you benchmarked what effect it would have by swapping fromFoldable to use foldl rather than foldr?

FYI the Haskell version depends more on the GHC runtime being smarter about stack usage, which is separate from laziness. In Haskell it’s very rare that you have to worry about stack overflows, because of how the GHC runtime works.

I just ran into this while solving Advent of Code. Any recommended workarounds in the meantime?