purescript/purescript-lists

Match nub* functions with Array

milesfrain opened this issue · 5 comments

The names and signatures of the nub* functions for List and Array are inconsistent.

-- List
nub :: forall a. Eq a => List a -> List a
nubBy :: forall a. (a -> a -> Boolean) -> List a -> List a

-- Array
nub :: forall a. Ord a => Array a -> Array a
nubEq :: forall a. Eq a => Array a -> Array a
nubBy :: forall a. (a -> a -> Ordering) -> Array a -> Array a
nubByEq :: forall a. (a -> a -> Boolean) -> Array a -> Array a

Proposing we make List match Array and use this lineup:

nub :: forall a. Ord a => List a -> List a
nubEq :: forall a. Eq a => List a -> List a
nubBy :: forall a. (a -> a -> Ordering) -> List a -> List a
nubByEq :: forall a. (a -> a -> Boolean) -> List a -> List a

We should also include examples and runtime info.

Thanks for the report. I think 0.14.0 is a good time to do this.

I'll work on this (can't self-assign though).

I'm thinking the simplest (and most performant) way to add Ord-based nub/nubBy to List is to just reuse Array's nub/nubBy. I don't expect the O(n) conversion cost to matter much compared to the O(n log n) and O(n^2) runtimes (todo - note these runtimes in the array docs).

We should also double-check and document whether the first, last, or unknown duplicate is kept.

Also going to add these missing functions to Data.List.Lazy and Data.List.NonEmpty.

Found that our List matches Haskell's nub and nubBy.

nub :: Eq a => [a] -> [a]
nubBy :: (a -> a -> Bool) -> [a] -> [a]

So now I'm wondering if it would be worth maintaining this consistency with Haskell, and editing Array to use this convention:

nub :: forall a. Eq a => Array a -> Array a
nubBy :: forall a. (a -> a -> Boolean) -> Array a -> Array a
nubOrd :: forall a. Ord a => Array a -> Array a
nubByOrd :: forall a. (a -> a -> Ordering) -> Array a -> Array a

This is not a place where I would want to maintain consistency with Haskell, because using Eq for nub is widely considered to have been a mistake. In many cases there is an Ord instance available, and because the performance is so much better when you do use Ord, the default option we provide should be the one that uses Ord. See also discussion in purescript/purescript-arrays#91.

If we don’t already depend on arrays I think it would be better not to add the dependency - sometimes we need to rearrange things in core, and if we have too many things depending on each other unnecessarily that becomes much more difficult to do. However, I think a direct implementation that works on lists should be relatively straightforward, if it follows the general strategy of the Array version. Instead of appending to a mutable array, I think it should be possible to build up a list in reverse order and then reverse it at the end.

As for the behaviour, we should keep the first occurrence of any element, ie we should return a list where the elements are in the same order of the first occurrences of each of them in the input. So eg nub (1:3:2:3:1:Nil) should be (1:3:2:Nil).