`#intersect` is algorithmically incorrect

Question

`#intersect` is algorithmically incorrect

suneettipirneni opened this issue 3 years ago · 0 comments

Feature

Current implementation Quirks

1.) `intersect` isn't commutative, even though it should be

When taking the intersection (∩) of two sets T and S:

𝑆 ∩ 𝑇 = 𝑇 ∩ 𝑆

Order doesn't affect the final outcome of the intersection. This makes sense since we're dealing with equality of elements within a given set and if elements are equal in both sets, the element is kept. Order has no involvement here.

Now let's see if the Collection#intersect respects this property of set intersection:

Sample code:

const T = new Collection<string, string>();
const S = new Collection<string, string>();

T.set('foo', 'bar');
T.set('left', 'right');

S.set('foo', 'baz');
S.set('left', 'other-left');

console.log(T.intersect(S));
console.log(S.intersect(T));

Output:

Collection(2) [Map] { 'foo' => 'baz', 'left' => 'other-left' }
Collection(2) [Map] { 'foo' => 'bar', 'left' => 'right' }

The output for each intersection is different. This means it isn't a true intersection because proper intersections are commutative. IE T.intersect(S) and S.intersect(T) should yield values that are exactly equal.

We can look at the current implementation to see why this is caused:

public intersect(other: Collection<K, V>) {
	const coll = new this.constructor[Symbol.species]<K, V>();
	for (const [k, v] of other) {
		if (this.has(k)) coll.set(k, v);
	}
	return coll;
}

This method only considers values from the right hand side of the intersection. More specifically it only uses keys as the basis of equality. Which brings me to another design oversight.

2.) Key-Based intersection doesn't actually intersect a `Map`-like structure.

Key-based intersection is just that, an intersection of keys within the key set of a map. However the issue comes into play once you try to tie values in within those keys. You never intersected the values, you only intersected the keys. The act of associating values in the intersection, that haven't been intersected, means that this is no longer an actual intersection.

Ideal solution or implementation

Compare Maps as a set of entries, not as a key-value store

To fix this, we need to change the way we think of map comparisons. Everything I stated above applies to sets. Well... we have an issue here, Map's aren't sets. Well in terms of K/V access they aren't. However in terms of implementation they are definitely are. Hence we have:

Map#entries

Now we can consider the map as a set rather than just a pure K/V access object. This means we can now apply the proper type of intersection on this map.

Let's try a proper intersection on the previous code example:

const T = new Collection<string, string>();
const S = new Collection<string, string>();

T.set('foo', 'bar');
T.set('left', 'right');

S.set('foo', 'baz');
S.set('left', 'other-left');

console.log(T.intersect(S));
console.log(S.intersect(T));

Output:

Collection(0) [Map] {  }
Collection(0) [Map] {  }

Wait what? The intersect is empty?!

Yup, this is completely intentional. We're now taking values into account which means if a given entry has the same key but differing values, it's no longer in the intersection. This preserves the true function of a proper intersection.

I want to note now intersect is commutative since T.intersect(S) is the same set as S.intersect(T).

But isn't that less useful?

Well what use cases are there for key-based intersection?

Also if the library is deciding which collection values to keep and which ones not to keep, doesn't this make the whole of idea of intersecting collections more confusing.

Usually if you want to intersect maps you have two maps that are homogenous in terms of key types and value types. So in most cases the "proper" intersection won't affect those cases.

For example intersecting roles of a user with one given in a list, has the same effect in both implementations. It just so happens that the algorithm I'm proposing is actually correct for all cases not just a certain cases.

How would this be implemented?

For key equality the same method of using Map#has would be used. Under the hood this method uses the SameValueZero algorithm for testing equality. Object.is is the API-equivalent implementation of SameValueZero. So naturally it would also be used for map-value equality.

public intersect(other: Collection<K, V>) {
	const coll = new this.constructor[Symbol.species]<K, V>();
	for (const [k, v] of other) {
        const thisV = this.get(k)
		if (thisV) {
			if (Object.is(thisV, v) {
             	coll.set(k, v);
             }
		}
	}
	return coll;
}

Now the intersection method does correct intersections, and avoids the pitfalls of the previous implementations.

Ok, but there are some use cases where I find key-based map intersection useful

Understood, however this isn't something the library should be implementing. The is because as stated above the library obfuscated the precedence it uses for intersections. And it would be confusing for a method literally called intersect to not perform a proper intersection.

Instead this functionality should be implemented by the user. Doing so, is quite trivial:

declare const T: Collection<string, string>;
declare const S: Collection<string, string>;

// I as a user can control precedence, in this case T has value precedence over S
// however I can change that freely it doesn't fit my use case.
const customIntersected = new Collection<string, string>();
T.filter((_v, k) => S.has(k)).forEach(customIntersected.set);

With the user constructing they're own key-based intersection it returns control to the user to set precedence, and doesn't rely on an implementation that is opinionated in terms of collection precedence.

Alternative solutions or implementations

We could remove intersect, but I don't think that's very ideal.

Other context

No response