bits-and-blooms/bitset

MarshalBinary() returns the length issus of the binary [] byte value

Vanlius opened this issue · 2 comments

I have 3648 bool values, after setting multiple points with BitSet.Set(), I convert BiteSet to []byte after using MarshalBinary(). According to the calculation of 1byte=8bit, the [ ]byte length theory should be 456. But the data turned out not to be 456 lengths. May I ask why this is?

sample:

t1:=bitset.New(3648)
t1.Set(1).Set(100).Set(1000).Set(2000).Set(3000)
result:=t1.MarshalBinary()
fmt.Println(len(result))     //The actual result is not 456

We have a function which can be used to get the binary size (in bytes):

func (b *BitSet) BinaryStorageSize() int

You can also directly access the underlying array if you'd like to write your own marshalling procedure...

func (b *BitSet) Bytes() []uint64 

According to the calculation of 1byte=8bit, the [ ]byte length theory should be 456.

That is an incorrect computation. To serialize an array of x bytes, you cannot simply write x bytes. You need some kind of length indicator so that when you are reading the bytes, you know that there are x bytes. The same is true in memory. An array of x bytes in Go does not use x bytes. You need x bytes, plus some metadata (which indicates the length among other things) and you need some alignment.

However, your computation is 'almost' correct: it is correct at scale... because it is only incorrect up to a small factor of less than 64 bytes. So for large bitsets, then 99.999% of the space usage will match your computation.

This is entirely general, btw, and not limited to bitset. You always need some metadata.

Of course, maybe you can save some bytes in your particular case because you can make some further assumptions. If so, then you can use the Bytes() method described above and write your own marshalling code.

thank you so much @lemire