Poor performance of IsSuperSet
Closed this issue · 1 comments
Hi, I just started using this package and it is great. My use case relies heavily on Complement
and IsSuperSet
with ~30k bits. I noticed that IsSuperSet
is the only method in the "family" (Difference
, Union
, Intersection
, Complement
) not utilizing bitwise operations on set []uint64
so I forked the repo and changed the implementation.
Is there a specific reason for that? If not, I'm proposing a following solution. I'm not sure about the contributions policy, so I didn't send a PR but I could if that's OK with you.
I added some tests to make sure I'm not breaking anything, and ran some benchmarks to make sure I'm not making performance worse. Looks like it is significantly worse (+75%) only in case of all zeros (both sets are just 10k zero bits). See details below:
go install golang.org/x/perf/cmd/benchstat@latest
go test -bench=BenchmarkIsSuperSet -count 6 | Tee-Object -FilePath old.txt
go test -bench=BenchmarkIsSuperSet -count 6 | Tee-Object -FilePath new.txt
benchstat old.txt new.txt
goos: windows
goarch: amd64
pkg: github.com/bits-and-blooms/bitset
cpu: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
IsSuperSet/equal,_len=1-6 8.977n ± 1% 5.468n ± 2% -39.08% (p=0.002 n=6)
IsSuperSet/equal,_len=1,_strict-6 6.611n ± 0% 5.981n ± 1% -9.52% (p=0.002 n=6)
IsSuperSet/equal,_len=10-6 22.235n ± 1% 5.499n ± 1% -75.27% (p=0.002 n=6)
IsSuperSet/equal,_len=10,_strict-6 6.569n ± 1% 5.993n ± 0% -8.78% (p=0.002 n=6)
IsSuperSet/equal,_len=100-6 163.450n ± 2% 6.001n ± 0% -96.33% (p=0.002 n=6)
IsSuperSet/equal,_len=100,_strict-6 8.414n ± 1% 7.869n ± 1% -6.48% (p=0.002 n=6)
IsSuperSet/equal,_len=1000-6 1756.50n ± 0% 14.12n ± 1% -99.20% (p=0.002 n=6)
IsSuperSet/equal,_len=1000,_strict-6 23.67n ± 1% 23.64n ± 1% ~ (p=0.937 n=6)
IsSuperSet/equal,_len=10000-6 17295.0n ± 1% 103.2n ± 2% -99.40% (p=0.002 n=6)
IsSuperSet/equal,_len=10000,_strict-6 201.9n ± 0% 202.2n ± 2% ~ (p=0.290 n=6)
IsSuperSet/equal,_len=100000-6 171296.5n ± 0% 940.3n ± 1% -99.45% (p=0.002 n=6)
IsSuperSet/equal,_len=100000,_strict-6 1.853µ ± 0% 1.855µ ± 1% ~ (p=0.394 n=6)
IsSuperSet/equal,_density=0.00-6 59.94n ± 1% 105.15n ± 2% +75.41% (p=0.002 n=6)
IsSuperSet/equal,_density=0.00,_strict-6 201.8n ± 1% 203.7n ± 1% +0.94% (p=0.045 n=6)
IsSuperSet/equal,_density=0.05-6 2483.5n ± 2% 103.1n ± 1% -95.85% (p=0.002 n=6)
IsSuperSet/equal,_density=0.05,_strict-6 201.2n ± 0% 202.1n ± 1% +0.47% (p=0.002 n=6)
IsSuperSet/equal,_density=0.20-6 7662.0n ± 0% 104.0n ± 1% -98.64% (p=0.002 n=6)
IsSuperSet/equal,_density=0.20,_strict-6 201.4n ± 0% 204.9n ± 1% +1.74% (p=0.002 n=6)
IsSuperSet/equal,_density=0.80-6 26343.0n ± 0% 103.3n ± 2% -99.61% (p=0.002 n=6)
IsSuperSet/equal,_density=0.80,_strict-6 201.5n ± 0% 202.7n ± 0% +0.60% (p=0.006 n=6)
IsSuperSet/equal,_density=0.95-6 31173.5n ± 0% 103.6n ± 1% -99.67% (p=0.002 n=6)
IsSuperSet/equal,_density=0.95,_strict-6 202.0n ± 1% 202.9n ± 1% ~ (p=0.143 n=6)
IsSuperSet/equal,_density=1.00-6 32853.0n ± 1% 103.5n ± 3% -99.68% (p=0.002 n=6)
IsSuperSet/equal,_density=1.00,_strict-6 201.4n ± 0% 204.0n ± 1% +1.34% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=0-6 4.731n ± 0% 3.919n ± 1% -17.17% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=0,_strict-6 201.3n ± 0% 204.0n ± 1% +1.37% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=100-6 179.250n ± 0% 4.441n ± 1% -97.52% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=100,_strict-6 201.2n ± 0% 202.7n ± 1% +0.72% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=1000-6 1755.50n ± 0% 13.07n ± 6% -99.26% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=1000,_strict-6 203.1n ± 1% 203.7n ± 1% ~ (p=0.457 n=6)
IsSuperSet/subset,_len=10000,_diff=9999-6 17310.5n ± 1% 106.0n ± 1% -99.39% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=9999,_strict-6 201.4n ± 0% 202.3n ± 1% ~ (p=0.065 n=6)
IsSuperSet/superset,_len=10000,_diff=0-6 17289.0n ± 0% 102.5n ± 1% -99.41% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=0,_strict-6 17572.5n ± 2% 312.4n ± 4% -98.22% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=100-6 17287.0n ± 0% 103.0n ± 2% -99.40% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=100,_strict-6 17521.0n ± 0% 308.6n ± 2% -98.24% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=1000-6 17369.0n ± 2% 104.5n ± 2% -99.40% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=1000,_strict-6 17815.0n ± 1% 309.8n ± 1% -98.26% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=9999-6 17294.0n ± 1% 102.8n ± 1% -99.41% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=9999,_strict-6 17499.0n ± 1% 308.9n ± 5% -98.23% (p=0.002 n=6)
geomean 823.2n 76.60n -90.69%
Please do issue a PR. There is no need to argue, we'll take it!!! :-)