bits-and-blooms/bitset

Poor performance of IsSuperSet

Closed this issue · 1 comments

nemars commented

Hi, I just started using this package and it is great. My use case relies heavily on Complement and IsSuperSet with ~30k bits. I noticed that IsSuperSet is the only method in the "family" (Difference, Union, Intersection, Complement) not utilizing bitwise operations on set []uint64 so I forked the repo and changed the implementation.

Is there a specific reason for that? If not, I'm proposing a following solution. I'm not sure about the contributions policy, so I didn't send a PR but I could if that's OK with you.

I added some tests to make sure I'm not breaking anything, and ran some benchmarks to make sure I'm not making performance worse. Looks like it is significantly worse (+75%) only in case of all zeros (both sets are just 10k zero bits). See details below:

go install golang.org/x/perf/cmd/benchstat@latest
go test -bench=BenchmarkIsSuperSet -count 6 | Tee-Object -FilePath old.txt
go test -bench=BenchmarkIsSuperSet -count 6 | Tee-Object -FilePath new.txt
benchstat old.txt new.txt
goos: windows
goarch: amd64
pkg: github.com/bits-and-blooms/bitset
cpu: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz
                                                    │    old.txt     │               new.txt               │
                                                    │     sec/op     │    sec/op     vs base               │
IsSuperSet/equal,_len=1-6                                8.977n ± 1%    5.468n ± 2%  -39.08% (p=0.002 n=6)
IsSuperSet/equal,_len=1,_strict-6                        6.611n ± 0%    5.981n ± 1%   -9.52% (p=0.002 n=6)
IsSuperSet/equal,_len=10-6                              22.235n ± 1%    5.499n ± 1%  -75.27% (p=0.002 n=6)
IsSuperSet/equal,_len=10,_strict-6                       6.569n ± 1%    5.993n ± 0%   -8.78% (p=0.002 n=6)
IsSuperSet/equal,_len=100-6                            163.450n ± 2%    6.001n ± 0%  -96.33% (p=0.002 n=6)
IsSuperSet/equal,_len=100,_strict-6                      8.414n ± 1%    7.869n ± 1%   -6.48% (p=0.002 n=6)
IsSuperSet/equal,_len=1000-6                           1756.50n ± 0%    14.12n ± 1%  -99.20% (p=0.002 n=6)
IsSuperSet/equal,_len=1000,_strict-6                     23.67n ± 1%    23.64n ± 1%        ~ (p=0.937 n=6)
IsSuperSet/equal,_len=10000-6                          17295.0n ± 1%    103.2n ± 2%  -99.40% (p=0.002 n=6)
IsSuperSet/equal,_len=10000,_strict-6                    201.9n ± 0%    202.2n ± 2%        ~ (p=0.290 n=6)
IsSuperSet/equal,_len=100000-6                        171296.5n ± 0%    940.3n ± 1%  -99.45% (p=0.002 n=6)
IsSuperSet/equal,_len=100000,_strict-6                   1.853µ ± 0%    1.855µ ± 1%        ~ (p=0.394 n=6)
IsSuperSet/equal,_density=0.00-6                         59.94n ± 1%   105.15n ± 2%  +75.41% (p=0.002 n=6)
IsSuperSet/equal,_density=0.00,_strict-6                 201.8n ± 1%    203.7n ± 1%   +0.94% (p=0.045 n=6)
IsSuperSet/equal,_density=0.05-6                        2483.5n ± 2%    103.1n ± 1%  -95.85% (p=0.002 n=6)
IsSuperSet/equal,_density=0.05,_strict-6                 201.2n ± 0%    202.1n ± 1%   +0.47% (p=0.002 n=6)
IsSuperSet/equal,_density=0.20-6                        7662.0n ± 0%    104.0n ± 1%  -98.64% (p=0.002 n=6)
IsSuperSet/equal,_density=0.20,_strict-6                 201.4n ± 0%    204.9n ± 1%   +1.74% (p=0.002 n=6)
IsSuperSet/equal,_density=0.80-6                       26343.0n ± 0%    103.3n ± 2%  -99.61% (p=0.002 n=6)
IsSuperSet/equal,_density=0.80,_strict-6                 201.5n ± 0%    202.7n ± 0%   +0.60% (p=0.006 n=6)
IsSuperSet/equal,_density=0.95-6                       31173.5n ± 0%    103.6n ± 1%  -99.67% (p=0.002 n=6)
IsSuperSet/equal,_density=0.95,_strict-6                 202.0n ± 1%    202.9n ± 1%        ~ (p=0.143 n=6)
IsSuperSet/equal,_density=1.00-6                       32853.0n ± 1%    103.5n ± 3%  -99.68% (p=0.002 n=6)
IsSuperSet/equal,_density=1.00,_strict-6                 201.4n ± 0%    204.0n ± 1%   +1.34% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=0-6                   4.731n ± 0%    3.919n ± 1%  -17.17% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=0,_strict-6           201.3n ± 0%    204.0n ± 1%   +1.37% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=100-6               179.250n ± 0%    4.441n ± 1%  -97.52% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=100,_strict-6         201.2n ± 0%    202.7n ± 1%   +0.72% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=1000-6              1755.50n ± 0%    13.07n ± 6%  -99.26% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=1000,_strict-6        203.1n ± 1%    203.7n ± 1%        ~ (p=0.457 n=6)
IsSuperSet/subset,_len=10000,_diff=9999-6              17310.5n ± 1%    106.0n ± 1%  -99.39% (p=0.002 n=6)
IsSuperSet/subset,_len=10000,_diff=9999,_strict-6        201.4n ± 0%    202.3n ± 1%        ~ (p=0.065 n=6)
IsSuperSet/superset,_len=10000,_diff=0-6               17289.0n ± 0%    102.5n ± 1%  -99.41% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=0,_strict-6       17572.5n ± 2%    312.4n ± 4%  -98.22% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=100-6             17287.0n ± 0%    103.0n ± 2%  -99.40% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=100,_strict-6     17521.0n ± 0%    308.6n ± 2%  -98.24% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=1000-6            17369.0n ± 2%    104.5n ± 2%  -99.40% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=1000,_strict-6    17815.0n ± 1%    309.8n ± 1%  -98.26% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=9999-6            17294.0n ± 1%    102.8n ± 1%  -99.41% (p=0.002 n=6)
IsSuperSet/superset,_len=10000,_diff=9999,_strict-6    17499.0n ± 1%    308.9n ± 5%  -98.23% (p=0.002 n=6)
geomean                                                  823.2n         76.60n       -90.69%
lemire commented

Please do issue a PR. There is no need to argue, we'll take it!!! :-)