NextSetMany is broken and probably not a good idea.
FlorianUekermann opened this issue · 4 comments
I don't have a shorter example unfortunately. This example will loop indefinitely:
import (
"fmt"
"math"
"math/rand"
"github.com/willf/bitset"
)
func main() {
var bits = make([]uint64, 68) //(1<<30)/8)
setRnd(bits, 4)
fmt.Println("Checksum", batched(bits))
}
func batched(input []uint64) (checksum uint) {
var bits = bitset.From(input)
var buffer [256]uint
var last, batch = bits.NextSetMany(0, buffer[:])
fmt.Println(last, batch)
for len(batch) > 0 {
for _, idx := range batch {
checksum += idx
}
last, batch = bits.NextSetMany(last+1, batch)
fmt.Println(last, batch)
}
return checksum
}
var rnd = rand.NewSource(0).(rand.Source64)
func setRnd(bits []uint64, halfings int) {
for i := range bits {
bits[i] = math.MaxUint64
for j := 0; j < halfings; j++ {
bits[i] &= rnd.Uint64()
}
}
}
My benchmarks indicate that NextSetMany is conceptually a bad idea as well. The following is depending on the circumstances (bitset length and density of ones) equally fast or faster:
func good(set []uint64) (checksum uint) {
for rep := 0; rep < numReps; rep++ {
for wordIdx, word := range set {
var wordIdx = uint(wordIdx * 64)
for word != 0 {
var bitIdx = uint(bits.TrailingZeros64(word))
word ^= 1 << bitIdx
// Do something with the result of the next line
var index = wordIdx + bitIdx
}
}
}
return checksum
}
I'm sure you can turn this in an iterator without too much effort, or just take a function pointer. Any serious work to be done will dwarf the overhead of the function pointer or iterator call anyway. In particular in the case with many zeros (which where the NextSet functions are useful) the relatively naive function above is much faster.
Note that the above is a fairly naive implementation, which is nowhere close to optimal. There is at the very least a superfluous word ==0
check in bits.TrailingZeros64
which could be skipped. But I suspect there are even more gains to be had with a little effort.
My benchmarks indicate that NextSetMany is conceptually a bad idea as well. The following is depending on the circumstances (bitset length and density of ones) equally fast or faster:
Let us try it out... Here is your function...
func good(set []uint64) (checksum uint) {
for wordIdx, word := range set {
var wordIdx = uint(wordIdx * 64)
for word != 0 {
var bitIdx = uint(trailingZeroes64(word))
word ^= 1 << bitIdx
// Do something with the result of the next line
var index = wordIdx + bitIdx
checksum += index
}
}
return checksum
}
Here is a related benchmark...
func BenchmarkFlorianUekermannIterateManyComp(b *testing.B) {
var input = make([]uint64, 68)
setRnd(input, 4)
b.ResetTimer()
var checksum = uint(0)
for i := 0; i < b.N; i++ {
checksum += good(input)
}
if checksum == 0 { // added just to fool ineffassign
return
}
}
Here is the version with our current API:
func BenchmarkFlorianUekermannIterateMany(b *testing.B) {
var input = make([]uint64, 68)
setRnd(input, 4)
var bitmap = From(input)
b.ResetTimer()
var checksum = uint(0)
for i := 0; i < b.N; i++ {
buffer := make([]uint, 256)
var last, batch = bitmap.NextSetMany(0, buffer)
for len(batch) > 0 {
for _, idx := range batch {
checksum += idx
}
last, batch = bitmap.NextSetMany(last+1, batch)
}
}
if checksum == 0 { // added just to fool ineffassign
return
}
}
Here are the result on a Linux server...
$ go version
go version go1.9 linux/amd64
dlemire@skylake:~/go/src/github.com/willf/bitset$ go test -bench=BenchmarkFlorianUekermannIterateMany
{0,1,2,3,4,5,6,7,8,9}
goos: linux
goarch: amd64
pkg: github.com/willf/bitset
BenchmarkFlorianUekermannIterateMany 3000000 523 ns/op
BenchmarkFlorianUekermannIterateManyReg 1000000 1738 ns/op
BenchmarkFlorianUekermannIterateManyComp 2000000 618 ns/op
Here are the results on my laptop...
$ go version
$ go test -bench=BenchmarkFlorianUekermannIterateMany
{0,1,2,3,4,5,6,7,8,9}
goos: darwin
goarch: amd64
pkg: github.com/willf/bitset
BenchmarkFlorianUekermannIterateMany-8 3000000 496 ns/op
BenchmarkFlorianUekermannIterateManyReg-8 1000000 1684 ns/op
BenchmarkFlorianUekermannIterateManyComp-8 3000000 579 ns/op
So according to these tests, your version is systematically slower.
I'm sure you can turn this in an iterator without too much effort
Pull request invited. Please also provide benchmarks showing that your code is faster.
Please upgrade to Version 1.1.6 the latest version.
You'll want to test larger bitsets as well as lower densities. 68 and 4 were chosen because that's where the bug happens. This could be anything.
It also seems like you replaced the TrailingZeros64
function, there's something odd going on with the compiler if I do that on my machine (not sure what, didn't have time to check the generated code today).
I won't send a PR, since we aren't using this library. I just stumbled over the bug while checking the claim about batching. I'll retest with the fixed code, but I doubt it makes a big difference.
In the lastest version, I have added tests at various densities. When the bitset is super low-density, the bulk of the time is just spent scanning zero words (arguably it is indicative that you shouldn't be using a bitset) so all methods are pretty much equivalent.
Otherwise, the NextSetMany
method is much faster than NextSet
as I reported.
Surprisingly (to me), NextSetMany
even beats your hand-crafted (non-iterated) approach. It may very well be that you could "turn this in an iterator without too much effort", but I don't know how to do it. I'd be very interested in a code sample.
$ go test -bench=BenchmarkFlorianUekermann
{0,1,2,3,4,5,6,7,8,9}
goos: linux
goarch: amd64
pkg: github.com/willf/bitset
BenchmarkFlorianUekermannIterateMany 3000000 516 ns/op
BenchmarkFlorianUekermannIterateManyReg 1000000 1790 ns/op
BenchmarkFlorianUekermannIterateManyComp 2000000 618 ns/op
BenchmarkFlorianUekermannLowDensityIterateMany 2000 1101767 ns/op
BenchmarkFlorianUekermannLowDensityIterateManyReg 1000 1239836 ns/op
BenchmarkFlorianUekermannLowDensityIterateManyComp 2000 1101817 ns/op
BenchmarkFlorianUekermannMidDensityIterateMany 100 12572068 ns/op
BenchmarkFlorianUekermannMidDensityIterateManyReg 50 25299202 ns/op
BenchmarkFlorianUekermannMidDensityIterateManyComp 100 14618127 ns/op
BenchmarkFlorianUekermannMidStrongDensityIterateMany 30 40667044 ns/op
BenchmarkFlorianUekermannMidStrongDensityIterateManyReg 10 104326575 ns/op
BenchmarkFlorianUekermannMidStrongDensityIterateManyComp 30 51650557 ns/op