purescript-deprecated/purescript-maps

Deprecate and move StrMap?

Closed this issue · 24 comments

Continuing discussion from #116.

StrMap seems to only earn its place in this library if it can be treated as a general-purpose map data structure with string keys. However, it's difficult to justify this: asymptotically it's significantly worse than Map String for many operations, and the order that Foldable and Traversable instances iterate over entries is unpredictable (since it is based on a JS for-in loop underneath). Moreover, it's not at all clear how we might implement StrMap for other backends, since it has various JS quirks (not just the traversal order issue, but also things like keys with names like __proto__). StrMap clearly is useful for JS interop, though.

Given all this, I'd like to consider deprecating StrMap, and moving it to a different (new?) package. I'd suggest calling the package purescript-js-objects and renaming the StrMap type to JSObject; I think that's a better reflection of what its intended use is.

Naively (without ever checking), I always thought StrMap was here because of its superior performance compared to Map String. How general is your statement "asymptotically it's significantly worse than Map String for many operations"?

Arguably, one of the basic operations of a Map is a key lookup, where I would expect the StrMap to be much faster.

Indeed, I just ran this little benchmark: Time to look up a single element by key (times are averaged over all keys in the map)
image

For these (moderate) sizes, Map String shows the expected(?) scaling, i.e. O(log(n)). StrMap seems the be almost constant, which could indicate that the logarithmically increasing time at these sizes is still dominated by some constant offset(?).

(Note: I'm not necessarily saying it's a bad idea to rename and move StrMap)

StrMap is good for lookups, but it's an ephemeral data structure. My primary reason for supporting this change is that all of it's characteristics depend on whatever the JS runtime is using for property look ups. It's implementation completely defers to FFI. It's not a general purpose, optimized hash table which can be translated to other backends (as a real PureScript implementation would). It's a very useful thing, with useful characteristics, but I don't think it belongs here. I think purescript-maps should provide portable implementations.

@sharkdp you're right, we would expect StrMap to win on lookup. It will do worse on pretty much any operation which returns a new StrMap, though: insert, update, delete, pop, update, and so on, because we have to create an entire new JavaScript object. Also, it's very easy to write functions with StrMap which are quadratic when they should be linear or O(n log n) or similar as a result of this.

paf31 commented

👍

@natefaubion Absolutely agreed. I was just surprised by "asymptotically it's significantly worse than Map String for many operations".

@hdgarrood Thank you for the clarification.

The documentation is a little misleading in using the words "to maximize performance" without making clear the situations where StrMap may or may not be performant (i.e. that like Array it's a persistant data structure implemented via cloning - just had this discussion on IRC.

As point of reference I sort-of ported purescript-maps to the Erlang backend, and while I was able to provide something usefully like StrMap, I needed to rip out Data.StrMap.ST anyway, and rewrite the mutation-based implementation of much of Data.StrMap.

It also does feel like 2 separate libraries, given there is no inter-reference and not even a shared module name...

Do we think the new purescript-js-objects library should be in core? Or maybe contrib would be better, since it wouldn't be portable?

paf31 commented

Every runtime already needs to implement records anyway, so I think it can be portable. The only non-portable bit is due to the way JS implements things like __proto__ fields, I think.

Also traversal order?

Plus:

> SM.lookup "hasOwnProperty" SM.empty :: Maybe Unit
(Just unit)

although I guess we can fix that by using Object.create(null) rather than {}.

paf31 commented

I think we can make the Traversable instance work properly and portably though, by getting a list of keys first, and then traversing in key order.

If the main use case for StrMap is JS interop, surely the current Foldable and Traversable instances are the right ones?

paf31 commented

Why's that? Is insertion order important for JS interop? I've never needed it.

Nor have I, but then again I've never used PureScript alongside a large and terrible JS codebase; I'm sure there's plenty of JavaScript code out there which does depend on iteration order. I think it's reasonable for people to expect to be able to translate for (var key in obj) {...} to for_ obj \key -> ... without anything changing; JavaScript objects have an iteration order already so I think the obvious choice is to use it.

I agree with @hdgarrood. If the goal is JS-object interop, it should be JS iteration order.

paf31 commented

Well I wasn't able to disprove the Traversable laws after a small amount of trying, so I guess it's fine from that point of view.

It seems counterintuitive for a map to be based on insertion order to me, but if it's for interop, and it is well documented, I think it's fine.

This is why I don't think Map should appear in its name :)

Just to clarify I agree that being based on insertion order is weird, but in my mind it's not based on insertion order, it's based on iteration order. These two things happen to be the same in most modern JS implementations but I'm pretty sure that iteration order is not defined by the ES specification, so we should document it as being based on iteration order, and if we do mention insertion order at all, say that it's only incidentally using insertion order because that's how iteration works in most JS implementations.

paf31 commented

What about using key order, but providing a function to iterate in the order given by for .. in?

I just want to make sure we explore all options, and having a Traversable instance based on something undocumented seems like something bad waiting to happen.

I agree that it seems like something bad waiting to happen; this is why I think we should be very clear in the docs that the JSObject type is not meant for general-purpose use. Even if we do provide a function for iterating in the order given by for ... in, I think having the Traversable instance sort keys before proceeding with the traversal would still be surprising.

paf31 commented

There's always the option of a newtype for ordering by key as well.

Right, or even just letting people do for_ (StrMap.toAscUnfoldable m :: Array _) \(Tuple _ v) -> [...]

@hdgarrood Iteration order is well-defined in the ECMAScript specification, and it's not quite insertion order: https://tc39.github.io/ecma262/#sec-ordinaryownpropertykeys

I think the order of iteration in a for...in statement is still left unspecified, as that order is based on EnumerateObjectProperties: https://tc39.github.io/ecma262/#sec-enumerate-object-properties says "The mechanics and order of enumerating the properties is not specified". It does also say that "EnumerateObjectProperties must obtain the own property keys of the target object by calling its [[OwnPropertyKeys]] internal method", but since EnumerateObjectProperties explicitly says that the order is not specified I assume it must mean that a for...in loop must iterate over everything in the list given by [[OwnPropertyKeys]], just not in any particular order.

@hdgarrood You're right, for-in order is not defined and in fact does not even need to be consistent. So PureScript couldn't provide something that exposes "the for-in iteration order" because there simply isn't one. But we could expose the keys in the order given by OrdinaryOwnPropertyKeys.