NVIDIA/nvcomp

[QST] Strange behavior with cascade batch compressor

ser-mk opened this issue · 3 comments

I have found some of datasets can’t be compressed or converted using batch nvcomp compressor without a runtime error.
This is the case for a sequence of integers that doesn't compress well. Let’s see
I got test_cascaded_batch.cpp and make minimal example for demo
If you run the example, you get result (checked on tag v2.1.0 and v2.2.0):
image
You can see the input date in the output
image
Function nvcompBatchedCascadedCompressAsync return the nvcompSuccess code
I did some research It and saw…
Compress algorithm broke on this step

 if (output + padded_out_bytes / sizeof(uint32_t) > output_limit) {
    return BlockIOStatus::out_of_bound;
  }

Output buffer exceeds the calculated limit from here

    // the metadata. It is users responsibility to guarantee this requirement.
    uint32_t* output_limit
        = output_buffer + roundUpDiv(partition_metadata_size, sizeof(uint32_t))
          + roundUpDiv(input_bytes, sizeof(uint32_t));

output_limit var have a capacity the partition metadata and input data and seemingly have to save delta data or byte packing data
This is not save because chunk_metadata_size take a cut of part of the output buffer

      // Move current output pointer as the end of chunk metadata
      current_output_ptr += chunk_metadata_size / sizeof(uint32_t);

Perhaps that's on purpose. May be ... The nvcomp library would prefers to choice minimal output data. The uncompressed data + minimal partition metadata is less the bad compressed data + partition metadata + chunk metadata. I need to get modified data for a dataset that doesn't compress well. I am going to use the modified data for further a cascade algorithm of my own devising.

Thanks so much for reporting this! Sorry that it took a little while to reply. The out of bounds access is due to a mistake in the test_cascaded_batch.cpp file. It shouldn't be implementing its own max_compressed_size function, but should instead be calling the nvcompBatchedCascadedCompressGetMaxOutputChunkSize API function. Using that function, the max compressed size is 248 bytes, so it avoids the out of bounds access. We'll be sure to fix the test file, and I'll double-check to see if there might be other places doing something like this, though at first glance, this looks like it might be the only place.

As for the lack of compression with the first 4 sets of options, if the compression wouldn't result in a smaller size than the original data, it just falls back to the uncompressed data. For the case of using deltas (differences) alone, all of the deltas after the first value are 1, but because bit packing isn't enabled, all of the 1's would be stored as a full byte, resulting in no compression. For the case of RLE (run-length encoding) alone, there are no repeated consecutive values, so RLE (storing values and repeat counts of those values that are all 1) would be larger than the original data. RLE with bit packing would bit pack all of the repeat counts that are 1, but since the unique value range (size 240) spans all 8 bits of the data type, the values can't be reduced, meaning that the size of the counts and the values are still large than the original data. With bit packing alone, the value range is still 240 large, so it still requires all 8 bits, meaning the output would be identical to the original data anyway. With deltas and bit packing, all of the deltas, (ignoring the first value) are 1, so can be bit packed down to 1 bit (or potentially zero bits, but I haven't double-checked whether that's supported), meaning that there's a lot less data than the original.
Hopefully that clears things up.

Thanks for your answer! Your pretty explain this case

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.