lemire/simdcomp

How to get the usable bytes from a compressed array?

Closed this issue · 11 comments

I understand I have to malloc enough to fit in the compressed array, but how do I retrieve the usable information? Sorry that I'm not a sophisticated user.

Can you make your question precise?

Which function are you calling?

https://github.com/lemire/simdcomp/blob/master/include/simdbitpacking.h#L17

@lemire I'm calling simdpack_length, so an arbitrary length array (without zeros). Would it make sense to use malloc_usable_size?

I am not sure why you would need malloc_usable_size. I guess you could use malloc_usable_size to check whether you have enough memory, but it is probably faster to keep track of that yourself rather than rely on malloc_usable_size unless you are not in charge of memory allocation.

I am afraid you will need to elaborate.

Taken from the example in the readme:

simdpack_length(datain, N, (__m128i *)buffer, b);

I simply don't know how to calculate the length of buffer...

@wshager

The spec. says

Returns a pointer to the (advanced) compressed array.

Isn't that clear?

@lemire probably not. Does advanced mean anything particular? Or does it just mean that it's advanced stuff. What I get from the example code is that I should work with the buffer that is used as the parameter, not with the return value of the function (i.e. the return value isn't used).

What surprises me is that while the buffer should contain a compressed array, the memory allocated to it exceeds that of the uncompressed array. What I don't understand is how to retrieve the compressed array.

@lemire Perhaps it would help me if there was an example were the compressed buffer is somehow serialized.

Compressed data is stored in the memory location between the provided (out) pointer and the returned pointer.

I'll add an example.

@lemire right! Thank you very much.

I have extended the API by adding a simdpack_compressedbytes function. Here is how one might use it :

  b = maxbits_length(datain, N);
  buffer = malloc(simdpack_compressedbytes(N,b)); // allocate just enough memory
  endofbuf = simdpack_length(datain, N, (__m128i *)buffer, b);
  /* compressed data is stored between buffer and endofbuf using (endofbuf-buffer)*sizeof(__m128i) bytes */
  /* would be safe to do : buffer = realloc(buffer,(endofbuf-(__m128i *)buffer)*sizeof(__m128i)); */
  simdunpack_length((const __m128i *)buffer, N, backbuffer, b);

I hope this helps.

@lemire that's exactly what I wanted to know, thanks.