NextSetMany is broken and probably not a good idea.

Question

NextSetMany is broken and probably not a good idea.

FlorianUekermann opened this issue 6 years ago · 4 comments

I don't have a shorter example unfortunately. This example will loop indefinitely:

import (
	"fmt"
	"math"
	"math/rand"

	"github.com/willf/bitset"
)

func main() {
	var bits = make([]uint64, 68) //(1<<30)/8)
	setRnd(bits, 4)

	fmt.Println("Checksum", batched(bits))
}

func batched(input []uint64) (checksum uint) {
	var bits = bitset.From(input)
	var buffer [256]uint
	var last, batch = bits.NextSetMany(0, buffer[:])
	fmt.Println(last, batch)
	for len(batch) > 0 {
		for _, idx := range batch {
			checksum += idx
		}
		last, batch = bits.NextSetMany(last+1, batch)
		fmt.Println(last, batch)
	}

	return checksum
}

var rnd = rand.NewSource(0).(rand.Source64)

func setRnd(bits []uint64, halfings int) {
	for i := range bits {
		bits[i] = math.MaxUint64
		for j := 0; j < halfings; j++ {
			bits[i] &= rnd.Uint64()
		}
	}
}

My benchmarks indicate that NextSetMany is conceptually a bad idea as well. The following is depending on the circumstances (bitset length and density of ones) equally fast or faster:

func good(set []uint64) (checksum uint) {
	for rep := 0; rep < numReps; rep++ {
		for wordIdx, word := range set {
			var wordIdx = uint(wordIdx * 64)
			for word != 0 {
				var bitIdx = uint(bits.TrailingZeros64(word))
				word ^= 1 << bitIdx
				// Do something with the result of the next line
				var index = wordIdx + bitIdx
			}
		}
	}
	return checksum
}

I'm sure you can turn this in an iterator without too much effort, or just take a function pointer. Any serious work to be done will dwarf the overhead of the function pointer or iterator call anyway. In particular in the case with many zeros (which where the NextSet functions are useful) the relatively naive function above is much faster.

Note that the above is a fairly naive implementation, which is nowhere close to optimal. There is at the very least a superfluous word ==0 check in bits.TrailingZeros64 which could be skipped. But I suspect there are even more gains to be had with a little effort.

Answer 1 · 2018-08-17T19:38:25.000Z

My benchmarks indicate that NextSetMany is conceptually a bad idea as well. The following is depending on the circumstances (bitset length and density of ones) equally fast or faster:

Let us try it out... Here is your function...

func good(set []uint64) (checksum uint) {
	for wordIdx, word := range set {
		var wordIdx = uint(wordIdx * 64)
		for word != 0 {
			var bitIdx = uint(trailingZeroes64(word))
			word ^= 1 << bitIdx
			// Do something with the result of the next line
			var index = wordIdx + bitIdx
			checksum += index
		}
	}
	return checksum
}

Here is a related benchmark...

func BenchmarkFlorianUekermannIterateManyComp(b *testing.B) {
	var input = make([]uint64, 68)
	setRnd(input, 4)
	b.ResetTimer()
	var checksum = uint(0)
	for i := 0; i < b.N; i++ {
		checksum += good(input)
	}
	if checksum == 0 { // added just to fool ineffassign
		return
	}
}

Here is the version with our current API:

func BenchmarkFlorianUekermannIterateMany(b *testing.B) {
	var input = make([]uint64, 68)
	setRnd(input, 4)
	var bitmap = From(input)
	b.ResetTimer()
	var checksum = uint(0)
	for i := 0; i < b.N; i++ {
		buffer := make([]uint, 256)
		var last, batch = bitmap.NextSetMany(0, buffer)
		for len(batch) > 0 {
			for _, idx := range batch {
				checksum += idx
			}
			last, batch = bitmap.NextSetMany(last+1, batch)
		}
	}
	if checksum == 0 { // added just to fool ineffassign
		return
	}
}

Here are the result on a Linux server...

$ go version
go version go1.9 linux/amd64
dlemire@skylake:~/go/src/github.com/willf/bitset$ go test -bench=BenchmarkFlorianUekermannIterateMany
{0,1,2,3,4,5,6,7,8,9}
goos: linux
goarch: amd64
pkg: github.com/willf/bitset
BenchmarkFlorianUekermannIterateMany     	 3000000	       523 ns/op
BenchmarkFlorianUekermannIterateManyReg  	 1000000	      1738 ns/op
BenchmarkFlorianUekermannIterateManyComp 	 2000000	       618 ns/op

Here are the results on my laptop...

$ go version
$ go test -bench=BenchmarkFlorianUekermannIterateMany
{0,1,2,3,4,5,6,7,8,9}
goos: darwin
goarch: amd64
pkg: github.com/willf/bitset
BenchmarkFlorianUekermannIterateMany-8       	 3000000	       496 ns/op
BenchmarkFlorianUekermannIterateManyReg-8    	 1000000	      1684 ns/op
BenchmarkFlorianUekermannIterateManyComp-8   	 3000000	       579 ns/op

So according to these tests, your version is systematically slower.

I'm sure you can turn this in an iterator without too much effort

Pull request invited. Please also provide benchmarks showing that your code is faster.

Answer 2 · 2018-08-17T19:39:05.000Z

Please upgrade to Version 1.1.6 the latest version.

Answer 3 · 2018-08-17T20:37:14.000Z

You'll want to test larger bitsets as well as lower densities. 68 and 4 were chosen because that's where the bug happens. This could be anything.

It also seems like you replaced the TrailingZeros64 function, there's something odd going on with the compiler if I do that on my machine (not sure what, didn't have time to check the generated code today).

I won't send a PR, since we aren't using this library. I just stumbled over the bug while checking the claim about batching. I'll retest with the fixed code, but I doubt it makes a big difference.

Answer 4 · 2018-08-17T23:42:24.000Z

In the lastest version, I have added tests at various densities. When the bitset is super low-density, the bulk of the time is just spent scanning zero words (arguably it is indicative that you shouldn't be using a bitset) so all methods are pretty much equivalent.

Otherwise, the NextSetMany method is much faster than NextSet as I reported.

Surprisingly (to me), NextSetMany even beats your hand-crafted (non-iterated) approach. It may very well be that you could "turn this in an iterator without too much effort", but I don't know how to do it. I'd be very interested in a code sample.

$ go test -bench=BenchmarkFlorianUekermann
{0,1,2,3,4,5,6,7,8,9}
goos: linux
goarch: amd64
pkg: github.com/willf/bitset
BenchmarkFlorianUekermannIterateMany                     	 3000000	       516 ns/op
BenchmarkFlorianUekermannIterateManyReg                  	 1000000	      1790 ns/op
BenchmarkFlorianUekermannIterateManyComp                 	 2000000	       618 ns/op

BenchmarkFlorianUekermannLowDensityIterateMany           	    2000	   1101767 ns/op
BenchmarkFlorianUekermannLowDensityIterateManyReg        	    1000	   1239836 ns/op
BenchmarkFlorianUekermannLowDensityIterateManyComp       	    2000	   1101817 ns/op

BenchmarkFlorianUekermannMidDensityIterateMany           	     100	  12572068 ns/op
BenchmarkFlorianUekermannMidDensityIterateManyReg        	      50	  25299202 ns/op
BenchmarkFlorianUekermannMidDensityIterateManyComp       	     100	  14618127 ns/op

BenchmarkFlorianUekermannMidStrongDensityIterateMany     	      30	  40667044 ns/op
BenchmarkFlorianUekermannMidStrongDensityIterateManyReg  	      10	 104326575 ns/op
BenchmarkFlorianUekermannMidStrongDensityIterateManyComp 	      30	  51650557 ns/op