Preserve ref-trans for HashSet: .toOrderedList instead of .toList
redbar0n opened this issue · 4 comments
Re: referential-transparency.md
(2a) … That is, hide the structural differences as internal implementation details. This gives referential transparency for both = and == for the data type.
Someone with this preference for a library would avoid exporting a function like toList for a set implementation that has no total order on the elements.
The notion of order is not contained within a set. But it is within the concept of a sequence or a list. So I imagine the clearest way to solve the problem you identified (to keep referential transparency) is that the library dealing with sets would export a .toOrderedList function that would always order the items of a set in the same way. (alternatively, more precisely: .toAscendingList and .toDescendingList)
It would also be natural, since by crossing the boundary between the concept of a set to the concept of a sequence/list then the property of an ordering needs to be added.
I imagine implementationally it would be straightforward too, as the list would be built up step by step, keeping the ordering relation intact. Could probably even use parallelized mergesort or something.
Whereas a regular .toList on a set conflates the sequential appearance of numbers in the code, i.e. in the data structure of the set, with the ordered nature of a list. So it just makes a direct translation from a set to a list, which amounts to the logical fallacy of equivocation (equivocating the different meanings of the code: only apparent order vs actual order). That’s why it ends up violating referential transparency.
That works fine if the elements of the set have a total order relation. But in some cases it is not so easy to define a total order of the elements, e.g. what if the elements are themselves also sets that can be nested any number of levels deep?
Then I imagine you would need to write another kind of ordering function, potentially recursive, to create a deterministic total order relation. Which would give the order that the resulting list should have. Or simply not afford a conversion to a list, if an ordering is impossible (I fail to imagine when that would be). In any case, treating the order of the elements in the list as the order the elements appear in the code for the set, won't do.
Maybe it was already clear in the article (or perhaps not), but note that these hash-based kinds of sets and dictionaries (aka maps) are often implemented such that all they expect of the set elements and keys is that they have a way to call a hash function on them. The implementer of the collection typically does not require that they have a comparison function that defines a total order on them. An implementer of such a hash-based data structure still might want to provide a function that can return a list of elements in some arbitrary order, as a convenience to users of the collection.
Yes, but thanks for clarifying. Imho, such a convenience function is actually essential to preserve referential transparency. So it should ideally have been a part of the implementation of the collection provided by the language.