ekmett/charset

`isSubsetOf` is broken if the left set is complemented and the right one isn't

Opened this issue · 4 comments

Consider:

>>> import Data.CharSet
>>> let c1 = complement $ singleton 'a'
>>> let c2 = build ((/=) 'a')
>>> isSubsetOf c1 c2
False

Clearly, isSubsetOf should return True here. The problem is the following line in the implementation:

isSubsetOf (CharSet False _ i) (CharSet True _ j) = all (\x -> I.member x i && I.member x j) [ol..oh] -- not bloody likely

The fix is easy: instead of && it should be ||. But actually, we can do better:

isSubsetOf (CharSet False _ i) (CharSet True _ j) = numChars == I.size (I.union i j)

I'll link a PR to this issue with the fix.

phadej commented
I.size (I.union i j)

is not necessarily better, it builds the whole IntSet full of codepoints, just to check its size. And it cannot short circuit.

Indeed, I.size (I.union i j) is certainly not O(1). But then again, it might still be slightly more performant than the all-based approach. Have you performed a microbenchmark to see if this is the case?

Clearly, we should adopt one of the proposed fixes, although I don't have a good sense for which one just yet. (Perhaps it doesn't matter that much.)