Blosc/c-blosc2

BYTEDELTA,CPU_HAS_SIMD=1 incompatible with BYTEDELTA,CPU_HAS_SIMD=0

froody opened this issue · 2 comments

froody commented

Describe the bug
Both the forward and backward pass of bytedelta initialize a uint8_t _v2 = 0 which has a subtle dual use. Chunks where stream_len % 16 != 0 result in different output for CPU_HAS_SIMD=0 vs CPU_HAS_SIMD=1.

In the CPU_HAS_SIMD=1 case, bytes16 v2 is continually updated with the current value of v2, but once the common tail/leftover path is entered, _v2 is re-initialized to 0 partway through the stream. In the CPU_HAS_SIMD=0 case, _v2 is continually updated with the rolling value.

The workaround to read data generated by CPU_HAS_SIMD=1 on systems with CPU_HAS_SIMD=0 is to reset _v2 to 0 when stream_len % 16 != 0 && ip % 16 == 0 && ip + 15 > stream_len, but the ideal fix is to initialize _v2 with the appropriate value from the bytes16 v2 when CPU_HAS_SIMD=1. Fixing it the ideal way would be backward incompatible with any previously serialized data so it seems like there's no perfect solution.

To Reproduce

  1. Compile the program below and statically link it to libblosc2, ensure that CPU_HAS_SIMD == 1 in plugins/filters/bytedelta/bytedelta.c. Rename this binary to bug_simd.
  2. Hardcode CPU_HAS_SIMD = 0 in filters/bytedelta/bytedelta.c, recompile libblosc2, and recompile and statically relink the program below. Rename this to bug_generic.
  3. ./bug_simd 1 # Successful roundtrip data <-> schunk !
  4. ./bug_generic 1 # Successful roundtrip data <-> schunk !
  5. ./bug_simd # Decompressed data differs from original 116, 231382041, 231371503!
  6. ./bug_simd 1 # Successful roundtrip data <-> schunk !
  7. ./bug_generic # Decompressed data differs from original 116, 231382041, 231392835!
#include <stdio.h>
#include <blosc2.h>
#include "blosc2/filters-registry.h"

#define NCHUNKS 1
#define NTHREADS 2
#define TYPESIZE 4
#define CHUNKSIZE (TYPESIZE * 29) // (CHUNKSIZE / TYPESIZE) % 16 must != 0 to exploit this bug.

int32_t ref_data[CHUNKSIZE] = {
  546303464, 823102828, 763542202,  46073357, 639725367, 831718428,
  754955461, 313104751, 556923531, 902333302, 874693449, 244457120,
  42880164, 677450262, 909282669, 716283249, 231382041, 835942381,
  341227635, 956334593, 656239542, 588781027, 805567515, 504325858,
  734152813, 220178903, 436001506, 691382423, 534977848
};

int main(int argc, char **argv) {
  blosc2_init();
  static int32_t data_dest[CHUNKSIZE];
  int32_t isize = CHUNKSIZE * sizeof(int32_t);
  blosc2_cparams cparams = BLOSC2_CPARAMS_DEFAULTS;
  cparams.compcode = BLOSC_ZSTD;
  cparams.filters[BLOSC2_MAX_FILTERS - 1] = BLOSC_FILTER_BYTEDELTA;
  cparams.filters_meta[BLOSC2_MAX_FILTERS - 1] = 7;

  blosc2_dparams dparams = BLOSC2_DPARAMS_DEFAULTS;
  blosc2_schunk* schunk;

  /* Create a super-chunk container */
  cparams.typesize = TYPESIZE;
  cparams.nthreads = NTHREADS;
  dparams.nthreads = NTHREADS;
  char *urlpath = (char *)"/tmp/dir1.b2frame";
  blosc2_storage storage = {.contiguous=false, .urlpath=urlpath, .cparams=&cparams, .dparams=&dparams};

  if (argc > 1) {
    /* Remove the directory and write new data */
    blosc2_remove_dir(storage.urlpath);
    schunk = blosc2_schunk_new(&storage);
    blosc2_schunk_append_buffer(schunk, ref_data, isize);
  }

  /* Read the chunks that were written before */
  schunk = blosc2_schunk_open(urlpath);
  blosc2_schunk_decompress_chunk(schunk, 0, data_dest, isize);
  for (int i = 0; i < CHUNKSIZE; i++) {
    if (data_dest[i] != ref_data[i]) {
      printf("Decompressed data differs from original %d, %d, %d!\n",
             CHUNKSIZE, ref_data[i], data_dest[i]);
      return -1;
    }
  }

  printf("Successful roundtrip data <-> schunk !\n");
  blosc2_schunk_free(schunk);
  blosc2_destroy();
  return 0;
}

Expected behavior
Steps 3 through 4 each print Successful roundtrip data <-> schunk !

Logs
N/A

System information:

  • OS: Ubunt 22.04
  • Compiler: clang-13
  • Version 0787e1b

Additional context
N/A

Thanks @froody . Provided that bytedelta has been introduced quite recently, I'd be in favor of the ideal fix. @aras-p what do you think?

aras-p commented

Yeah I guess a proper fix would be best for now (the more time passes, the harder it is to create a proper fix).