purescript-deprecated/purescript-maps

toUnfoldable leaks structure

Opened this issue · 7 comments

toUnfoldable currently leaks structure by doing a breadth-first traversal of the Map.

Potential solutions, subsequent to benchmarking:

  1. If it turns out that toUnfoldable really isn't faster than toAscUnfoldable, then toUnfoldable = toAscUnfoldable should be a simple fix. It would break anything that was depending on the order of toUnfoldable, which is not what people should be doing. Maybe you also deprecate toAscUnfoldable.

  2. Put some sort of constraint on toUnfoldable to ensure that it unfolds to a structure where ordering is not important?

  3. Place a strongly worded, ugly warning comment on toUnfoldable.

Other ideas?

There are some benchmarks here showing toUnfoldable to be significantly faster than toAscUnfoldable: #79 (comment)
Will try to run these again, since there have been some compiler changes since that time, to see if there is still a significant difference.

I've done some benchmarking:

Sadly, toAscUnfoldable is over 10% slower than toUnfoldable on larger inputs. Example:

toUnfoldable: big map (1000000)
mean   = 805.20 ms
stddev = 91.14 ms
min    = 687.07 ms
max    = 927.82 ms

toAscUnfoldable: big map (1000000)
mean   = 910.99 ms
stddev = 6.16 ms
min    = 902.18 ms
max    = 920.25 ms

My inclination is to say that this means that toUnfoldable is a function that still has its uses in cases where the user doesn't care about order. There just needs to be a constraint or strong warning of some sort. @hdgarrood any thoughts here?

My inclination is to say that this means that toUnfoldable is a function that still has its uses in cases where the user doesn't care about order. There just needs to be a constraint or strong warning of some sort.

Yes, I think this makes sense. It's a breaking change, but I'd like to have toUnfoldable be the 'safe' one, i.e. the one which gives you back results in ascending order of keys, and the current toUnfoldable be renamed to something like toUnfoldableUnsafe with a note that depending on the order of the structure returned can result in unpredictable behaviour, and in most cases you should use toUnfoldable instead.

a note that depending on the order of the structure returned can result in unpredictable behaviour, and in most cases you should use toUnfoldable instead.

I have to say that I really like the idea that a constraint could be used for something like this, too, in the same way that Partial goes a step further than simply putting a warning in docs. Now I imagine situations like this one are pretty rare, so it might not be worth the trouble, but what would a "this is only safe when order is irrelevant" constraint look like?

Right; it's difficult for me to imagine what that would look like, though. One way to expose this safely would be to define something like foldCommutative :: forall m k v. Commutative m => Monoid m => (k -> v -> m) -> Map k v -> m with implementation \f -> foldMap f <<< (toUnfoldableUnsafe :: _ -> Array _)... although if you are dealing with a performance bottleneck you're probably going to want to avoid foldMap anyway.

One way to expose this safely would be to define something like foldCommutative :: forall m k v. Commutative m => Monoid m => (k -> v -> m) -> Map k v -> m with implementation \f -> foldMap f <<< (toUnfoldableUnsafe :: _ -> Array _)... although if you are dealing with a performance bottleneck you're probably going to want to avoid foldMap anyway.

As a thought experiment, I'm wondering if this would necessarily be true with the right compiler optimizations... I would naïvely think that the compiler might to be able to optimize a sum written in terms of foldMap to be (near-)equivalent to its foldl version. As Int under addition is a Commutative Monoid, you would then be able to realize the perf benefits of unordered traversals.