NVIDIA/nvcomp

[BUG] The Cascade bitpack can’t compress unsigned integers with a value greater than the maximum signed value

ser-mk opened this issue · 4 comments

Describe the bug

The cascaded bitpack layer can’t compress unsigned integers where one part of the integers is less than the signed maximum value and the other part is greater than the signed maximum value.

Steps/Code to reproduce bug

For example I got test_cascaded_batch.cpp and make minimal examples
https://github.com/ser-mk/nvcomp/blob/v2.2.0_test/tests/test_bp_unsigned_type.cpp

Here I try to compress the same simple sequence of integers twice:
127,128,127,128,127,128,127,128,127,128,127,128,127,128,127,128,127,128,127,.....

At first, it was compressed as uint8_t. Output size is 40 bytes

And then it was compressed as uint16_t. Output size is 28 bytes!

Expected behavior

It would seem that the output sizes should be the same. But If we compress the sequence as uint16_t(or int16_t, uint32_t etc) We have benefited a compress ratio greatly as uint8_t.

The full log of example (checked on tag v2.2.0):
image

We can see, the first compressing operation can’t modify the input data and just passes to the output buffer

Environment details (please complete the following information):

tag v2.2.0

Additional context

I did some research It and I guess I found a reason:
https://github.com/NVIDIA/nvcomp/blob/v2.2.0/src/CascadedKernels.cuh#L384

The input unsigned type is casted to a signed type of equivalent size. Thus, 128 of uint8_t is casted to -128 of int8_t. Therefore a bit width of the sequence is 8 bit instead of 1 bit in the correct case

Note:
The similar behavior has been repeated for Cascade C API (already deprecated for tag v2.2.0) https://github.com/NVIDIA/nvcomp/blob/v2.1.0/include/nvcomp/cascaded.h

Example
https://github.com/ser-mk/nvcomp/blob/v2.1.0_issue/tests/test_delta_bp_unsinged_type.cpp

In the example, I try to compress the non sorted integers:
0,1,2,1,4,5,6 ….
using by the Deltas and BitPacking layer

At first, it was compressed as int8_t. Output size 108 bytes
And then it was compressed as uint8_t. Output size 128 bytes!
The full log of example:

int8_t compress size: 108
uint8_t compress size: 128

I did some research It and I guess I found a reason:
The vals_delta array has the same type as the input data
https://github.com/NVIDIA/nvcomp/blob/v2.1.0/src/highlevel/CascadedCompressionGPU.cu#L484

The Deltas function for the non sorted integers (... 2, 1, 3...) gives negative (signed) integers for the BitPack function
https://github.com/NVIDIA/nvcomp/blob/v2.1.0/src/highlevel/CascadedCompressionGPU.cu#L684

In the uint8_t input data case, the BitPack function interprets this signed integer as the unsigned integer ie -1 of signed type is 254 of uint8_t. So the bit width is calculated is high than for signed input type (the int8_t case https://github.com/ser-mk/nvcomp/blob/v2.1.0_issue/tests/test_delta_bp_unsinged_type.cpp#L94)

For more effective compressing the wide range integers I would suggest to improve a bitwidth. Currently, to calculate the bitwidth of input integers the library handle the integers as a sign type only https://github.com/NVIDIA/nvcomp/blob/v2.2.0/src/CascadedKernels.cuh#L384
because the maximum/minimum values of the input integers was calculated for a sing type once.

If we calculate the maximum/minimum values for sign and unsign type. We can take the minimum difference between these values, allowing we get the best the bitwidth.
For simple example:

int8_t array[2] = [-128, 127]

for sign type: min value -128 max value 127. Diffrence 255, bit width 8 bit
for unsign type: min value 127, max value 128. Difference 1, bit width 1
As you can see, this operation reduces the bitwidth from 8 to 1 bit!

uint8_t: 0, 255, 0, 255 …
int8_t: -128, 127, -128, 127…
uint16_t: 0, 65535, 0, …
int16_t: -32768, 32767, …
etc.
I have write a draft with this feature
ser-mk@c779e47
and test
https://github.com/ser-mk/nvcomp/blob/v2.2.0/fix_bitpacking_for_wide_range/tests/test_cascaded_batch.cpp#L1209

If you run the test without this patch, it fails because the bitpack layer cannot compress wide range integers, so the output compression size is larger than expected.

Note
Making the library find min/max values twice might not be the most efficient way of tackling the issue. There's a faster way to calculate min/max for sign and unsign types in one pass. But it requires more complex code design.

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Thanks so much for reporting this, and sorry that it took a while to reply! We have a fix for this, but it probably won't be in place until nvcomp 2.4.

The code for determining the frame of reference for the bit packing was always using a signed type, even for unsigned input data types, since that's what's usually most useful for if there's at least one delta encoding pass, but as you found out, if there's no delta pass, it misses the opportunity to compress. In the fixed version, if there's no delta pass, it will use the specified input data type (e.g. uchar or char) to determine whether it should look for a signed or unsigned frame of reference. It made things a bit slower to try to pick whichever would be better for each block, so it's just static based on the specified type.