bits-and-blooms/bitset

Cap() validation is not working as expected

Closed this issue · 6 comments

Using the below code:

b := bitset.New(1)
b.Set(math.MaxInt64)

it causes panic: runtime error: makeslice: len out of range

Digging deeper Set(i uint) has

if i >= b.length { // if we need more bits, make 'em
  b.extendSet(i)
}

and extendSet(i uint) validates the input by

if i >= Cap() {
  panic("You are exceeding the capacity")
}

My expectation was that I will run into "safety" rules of the library rather than regular panic.

My expectation was that I will run into "safety" rules of the library rather than regular panic.

There is a panic from the Go runtime because the bitset you request exceeds the maximal allocation possible on your system.

This is how the Go language works... if you do...

	v := make([]int, math.MaxInt64)

You will get a panic. And, indeed, that's where the panic comes from.

I have reviewed our code. Here it is:

// Set bit i to 1, the capacity of the bitset is automatically
// increased accordingly.
// If i>= Cap(), this function will panic.
// Warning: using a very large value for 'i'
// may lead to a memory shortage and a panic: the caller is responsible
// for providing sensible parameters in line with their memory capacity.
func (b *BitSet) Set(i uint) *BitSet {}

Notice this part:

// Warning: using a very large value for 'i'
// may lead to a memory shortage and a panic: the caller is responsible
// for providing sensible parameters in line with their memory capacity.

I am closing this issue. I think that we are quite careful. The function is well documented.

If you disagree with this behaviour, you should take it up with the Go language team.

Thank you for the explanation @lemire

I definitely missed the warning part however it does not actually address my concern.

I am struggling to understand application of exported Cap() method. Per docs

// Cap returns the total possible capacity, or number of bits

My understanding was that the method indicates the max number of bits I can set. E.g. I could use it like...

myVal := uint(...) // some big number
if myVal < bitset.Cap() {
  b.Set(myVal)
}

In my system I am getting the value 18446744073709551615 for Cap() and 9223372036854775807 for math.MaxInt64 so I try setting that much but fail.

I might not be seeing the real intent of Cap() then.
I am totally fine taking the responsibility for handling the limits in my app myself. As I user of the library though I would expect that library expose maxes it can handle.

As I user of the library though I would expect that library expose maxes it can handle.

It does precisely that. It tells you how much the library can support.

The panic you got is from the standard Go library. You get an allocation error.

How large a slice you can allocate is system specific. This how Go works. The following will panic:

v := make([]int, math.MaxInt64)

This is the Go language. It has nothing to do with the library.

Now, if you know some way to find out, at any give time, in a Go program, the maximal value k such that

v := make([]int, k)

will not panic, then I'd be very interested.

Let me stress again: this is how Go works.

Furthermore, you also contributed to the library non-trivial code (thanks!). So you know that it is implemented with a slice internally.

You wrote the following code:

	if len(b.set) < nsize {
 		dst = make([]uint64, nsize, 2*nsize)
 	}

The make call may panic if the system does not have enough memory. You seem to imply that it should not panic. Why did you not need compute the allowable amount of memory to avoid the panic?

Because this is how we code in Go! We assume that memory allocations succeed, and if not, a panic is raised.

If you are concerned about it, you can use a defer to recover from the panic. But, quite often, programs in Go that suffer a memory allocation failure just die. That is the default.

In my system I am getting the value 18446744073709551615 for Cap() and 9223372036854775807 for math.MaxInt64 so I try setting that much but fail.

A 64-bit system should have a theoretical limit of 18446744073709551615. An IBM s390x has a limit of 18446744073709551615 bytes, and reportedly Go might support that (theoretically). So on a mainframe, you could get close to this limit. I am not sure whether there are actual systems with that much memory (exabytes). But there might be one in the future.

The Cap limit is more practical for 32-bit systems where it is 4294967295. That can definitively be reached and is the main reason why we have a Cap in the first place. If you write portable code in Go, you should not assume that it will run on a 64-bit system, someone might run it on a 32-bit system... and the library itself should not fail because it is a 32-bit system. Now, there is a different limit in our library depending on whether you have a 32-bit or 64-bit system.

Going back to what you attempted to do...

Ok. So a BitSet of storing 9223372036854775807 bits would use 1 exabytes of RAM (memory).

I am going to assume that you do not have an IBM s390x mainframe. If it failed on an IBM s390x mainframe with enough memory, please give me access to the mainframe and I will debug it.

Though it may have increased, this stackoverflow answer suggests that the Linux kernel is limited at 128 terabytes of RAM. I don't know if any system is capable of that. Windows 11 is reportedly limited to 6 terabytes of memory.

Thus the memory allocation will fail, assuredly, and you will get a panic.

Importantly, it is not the library that will fail you, but your system. Get an IBM s390x mainframe with a couple exabytes of memory and it could work !!! I say could because I never tried it.

Note that I would not assume that it will be fast. Allocating an exabyte of memory could take quite a bit of time.

You might object... "how could I know how much memory a BitSet instance take".

Let us look at the README file...

In the first few paragraphs...

Screenshot 2024-08-23 at 10 55 39 PM

And then later...

Screenshot 2024-08-23 at 10 46 58 PM

So we tell you how much memory it takes... very explicitly... and we point at the roaring library which can definitively scale...

If you need to, you can scale up to the 64-bit scale:

package main

import (
	"fmt"

	"github.com/RoaringBitmap/roaring/v2/roaring64"
)

func main() {
	rb1 := roaring64.BitmapOf(9223372036854775807)
	fmt.Println(rb1.String())
}

This will work on any machine.

Thank you @lemire for the very detailed explanation.

As I user of the library though I would expect that library expose maxes it can handle.

By saying that I did not mean to accuse the library in not helping the user.
I was a tad naive on practical vs theoretical limits and what I tried to do is indeed quite far from feasible values :)

@skullhole Sure. I wanted to explain in details because there are often misunderstandings regarding memory allocation. This is true irrespective of the programming language.

There is no perfect solution. If allocation fails... what do you do? The default in Go is to panic and (generally) exit the program. This is a common solution.