"Vector" instances are questionable
johnynek opened this issue · 21 comments
@non @tixxit and I were talking about mapVectorEq
and for instance the Group
on Map[K, V]
that assumes that a missing key is the same as Group.empty
.
This has some issues. Without the correct Eq[Map[K, V]]
in scope, you violate the laws. Moreover, the standard Eq
should not be one that has this behavior, because that does not make sense for a lot of use cases.
Our solution is to wrap such instances:
// infinite map defined at all keys K
trait SparseMap[K, V] {
def get(k: K): V
def updated(k: K, v: V): SparseMap[K, V]
def toMap: Map[K, V]
}
// an infinite vector that is defined at all points (i: Int)
trait SparseVector[T] {
def apply(i: Int): T
def updated(i: Int, v: T): SparseVector[T]
def toVector: Vector[T]
}
In this way #126 would be an instance on SparseMap
The solution by defining a new type for these versions of Map
and Vector
sounds good. What is the motivation behind naming them Vector*
?
@TomasMikula I believe the Vector
prefix is in reference to vector equality like in linear algebra
Oh, I see, but you mean infinite-dimensional vectors, right? Then a more descriptive name would be Infinite{Vector|Map}
instead of Vector{Vector|Map}
, wouldn't it?
I think @johnynek mentioned SparseMap as a possibility, which is descriptive too. In the case of Vector
it's "really" a vector with Int.MaxValue elements too :\
"Sparse" sounds good!
👍 Sparse
On the issue of names: I got a library https://github.com/rklaehn/abc where I have implementations of Seq and Map that allow defining a lot of typeclass instances without having a strange Eq
. I called them TotalArrayMap
/ TotalArraySeq
. Whether they are sparse or not is an implementation detail IMHO. The important thing is that in both cases the apply method is total, because there is a default value in case no mapping is provided.
https://github.com/rklaehn/abc#-totalarraymapk-v
But sparse is also fine. Just pease don't have a strange Eq
on the default implicit scope just to make normal map work...
So, what would be the requirements on the "default" value in the sparse vector/map?
- No requirements. It can be any value, and is provided at map/vector creation time.
- It is a "zero" of some algebraic structure, such as Monoid zero, or Lattice bottom.
I see @rklaehn does 1. I see a danger of accidentally getting an instance of, say, Monoid[SparseMap[K, V]]
(given Monoid[V]
), such that the map's default value is not the same as Monoid[V].zero
.
The problem of 2. is: which zero should it be? Monoid zero? Lattice bottom? Monoid might seem the most general requirement: lattice bottom and the join operation form a monoid; but then so do lattice top and the meet operation. And neither might be the most natural monoid on V
.
The empty value of a Monoid[TotalArrayMap[K, V]]
is a TotalArrayMap with no elements and the default value of Monoid[V].empty
. So unless you have different Monoid instances in scope at different places, everything is consistent. See https://github.com/rklaehn/abc/blob/master/core/src/main/scala/com/rklaehn/abc/TotalArrayMap.scala#L147
If you have m1, m2: TotalArrayMap[K, V]
with some default value other than Monoid[V].empty
, and then combine them using Monoid[TotalArrayMap[K, V]].combine
, what is the default value of the result? (I know it is well defined and I can see from the source code that it is the default value of m1
.) My point is, I probably never wanted to use a monoid with a different empty value than my map's default value. So I now have a bug in my code that is hard to spot.
The default value of the result will be result.default === Monoid[V].combine(m1.default, m2.default)
. I don't see how it could be any other way, because for every k, result(k) === Monoid[V].combine(m1(k), m2(k))
must be true.
There is a "fast path" for when the default of both m1 and m2 is Monoid[V].empty. In that case you do not have to perform the combine op at all for values that aren't in both maps. But that is just an implementation detail.
import com.rklaehn.abc._
import cats.implicits._
import algebra._
val a = TotalArrayMap.fromDefault[Int,Int](1)
val b = TotalArrayMap.fromDefault[Int,Int](2)
scala> Monoid[TotalArrayMap[Int, Int]].combine(a, b).default
res3: Int = 3
I shouldn't embarrass myself by making false statements about code I haven't fully read :) Yeah, how you handle that situation seems the only correct way.
Still, the concern about accidentally combining maps with different default values remains. (Not that I have a better solution. I'm just trying to understand potential dangers of either approach.)
I think doing something like this would be the way to go. TotalMap
TotalVector
seems fine to me.
I agree that Sparse
seems to imply the default value is empty
but that ties creation of these things to a Monoid
which might not be ideal for reasons mentioned.
I don't think you have a problem if each instance of the class carries their own default value (and it won't cost much in storage, since it is just one more value).
The problem can occur precisely when each instance carries their own default value, and you accidentally combine maps with different default values (unless you prohibit that at runtime).
Ultimately, the problem with both approaches is kind of the same:
- possibility to accidentally use different default values;
- possibility to accidentally use different type class instances for the same type (and thus different default values).
Maybe a good compromise would be Monoid[SparseMap[K, V]]
throwing a runtime exception when you try to combine maps with different default values, while using SparseMap[K, V]#combine(rhs, f)
directly would still allow you to do that. Thoughts?
SparseMap[K, V]#combine(rhs, f)
signature could even be
def combine[U, W](rhs: SparseMap[K, U], f: (V, U) ⇒ W): SparseMap[K, W]
(changing the value type to W
), so changing the default value must stay allowed there.
I'm confused, I am proposing:
trait TotalMap[K, V] {
def nonDefaultKeys: Set[K] // maybe private[algebra]
def apply(k: K): V
def update(k: K, v: V): TotalMap[K, V]
}
object TotalMap {
// default is embedded in TotalMap, not in any algebra instance.
def empty[K, V](default: V): TotalMap[K, V] = ...
}
So, what is the risk? when you combine you do:
def combine(a: TotalMap[K, V], b: TotalMap[K, V]): TotalMap[K, V] = {
val res = TotalMap.empty[K, V](Monoid[V].empty)
a.nonDefaultKeys.union(b.nonDefaultKeys).foldLeft(res) { (old, k) =>
old.update(k, a(k) + b(k))
}
}
sorry if I'm being dense, but is there a risk there?
Suppose
a.defaultValue + b.defaultValue != Monoid[V].empty
Pick a key k
that is absent from both a
and b
. One would expect
combine(a, b)(k) == a(k) + b(k)
== a.defaultValue + b.defaultValue
but instead it is
combine(a, b)(k) == Monoid[V].empty
!= a.defaultValue + b.defaultValue
@TomasMikula indeed, thanks, so fixing it to:
trait TotalMap[K, V] {
def nonDefaultKeys: Set[K] // maybe private[algebra]
def defaultValue: V // maybe private[algebra]
def apply(k: K): V
def update(k: K, v: V): TotalMap[K, V]
}
object TotalMap {
// default is embedded in TotalMap, not in any algebra instance.
def empty[K, V](default: V): TotalMap[K, V] = ...
}
def combine(a: TotalMap[K, V], b: TotalMap[K, V]): TotalMap[K, V] = {
val res = TotalMap.empty[K, V](a.defaultValue + b.defaultValue)
a.nonDefaultKeys.union(b.nonDefaultKeys).foldLeft(res) { (old, k) =>
old.update(k, a(k) + b(k))
}
}
that works, right?
That's basically what @rklaehn does in his library. I find this correct. Note, however, that there is no Monoid
(or BoundedJoinSemilattice
) involved yet, here. If this was the behavior of Monoid[TotalMap[K, V]].combine
, it could be a nasty surprise that the default value of the resulting map is not Monoid[V].empty
. That's why I suggested that if we were to provide Monoid
instance for TotalMap
, that monoid's combine
method should check (at runtime) that the defaultValue
s of the operands are both Monoid[V].empty
.
- possibility to accidentally use different type class instances for the same type (and thus different default values).
If you ever do that in a scala program, you are basically in hell. E.g. you build a RedBlackTree using one order, and then try to get values out of it while another order is in the implicit scope. Capturing typeclass instances at creation is also no solution, because then you lose most of the advantages of a typeclass-based design.
This is a problem with the lack of typeclass coherence. See https://www.youtube.com/watch?v=hIZxTQP1ifo
Maybe a good compromise would be Monoid[SparseMap[K, V]] throwing a runtime exception when you try to combine maps with different default values, while using SparseMap[K, V]#combine(rhs, f) directly would still allow you to do that. Thoughts?
I want all methods on my data structures to be total. So throwing an exception is not ever a good solution for me.
Suppose
a.defaultValue + b.defaultValue != Monoid[V].empty
Pick a key k that is absent from both a and b. One would expectcombine(a, b)(k) == a(k) + b(k)
== a.defaultValue + b.defaultValue
but instead it iscombine(a, b)(k) == Monoid[V].empty
!= a.defaultValue + b.defaultValue
No, the result will be whatever the operation is applied to the default values. So if you add two maps, the default values will also be added. Everything's fine. combine
in my code is not necessarily the monoid combine, but whatever the current op is.
Basically a total map is just a tabulated function. Once it is created, the fact that it is backed by a finite number of mappings is an implementation detail. It must compose like a function, so
a(k) op b(k) === (a op b)(k) for all k (including those k where neither a nor b have keys)
No, the result will be whatever the operation is applied to the default values.
In your library, yes. My comment was about @johnynek's code posted above.
When we moved to cats-kernel we removed these instances.