`isSubsetOf` is broken if the left set is complemented and the right one isn't
Opened this issue · 4 comments
mcschroeder commented
Consider:
>>> import Data.CharSet
>>> let c1 = complement $ singleton 'a'
>>> let c2 = build ((/=) 'a')
>>> isSubsetOf c1 c2
False
Clearly, isSubsetOf
should return True
here. The problem is the following line in the implementation:
isSubsetOf (CharSet False _ i) (CharSet True _ j) = all (\x -> I.member x i && I.member x j) [ol..oh] -- not bloody likely
The fix is easy: instead of &&
it should be ||
. But actually, we can do better:
isSubsetOf (CharSet False _ i) (CharSet True _ j) = numChars == I.size (I.union i j)
I'll link a PR to this issue with the fix.
phadej commented
I.size (I.union i j)
is not necessarily better, it builds the whole IntSet
full of codepoints, just to check its size. And it cannot short circuit.
mcschroeder commented
Ah, I see. I thought size on `IntSet` was O(1). Oh well, I guess it can’t be helped then.
RyanGlScott commented
Indeed, I.size (I.union i j)
is certainly not O(1). But then again, it might still be slightly more performant than the all
-based approach. Have you performed a microbenchmark to see if this is the case?
Clearly, we should adopt one of the proposed fixes, although I don't have a good sense for which one just yet. (Perhaps it doesn't matter that much.)
mcschroeder commented
I’ve only played around with the different versions in the REPL while trying to diagnose the problem, and it seemed to me at the time that the `I.size` solution was noticeably faster—but I wouldn’t bet on it, might’ve been some other effect.
I’ll run some benchmarks when I have the time in the next couple of days :)