BYTEDELTA,CPU_HAS_SIMD=1 incompatible with BYTEDELTA,CPU_HAS_SIMD=0
froody opened this issue · 2 comments
Describe the bug
Both the forward and backward pass of bytedelta initialize a uint8_t _v2 = 0
which has a subtle dual use. Chunks where stream_len % 16 != 0
result in different output for CPU_HAS_SIMD=0 vs CPU_HAS_SIMD=1.
In the CPU_HAS_SIMD=1 case, bytes16 v2
is continually updated with the current value of v2, but once the common tail/leftover path is entered, _v2
is re-initialized to 0
partway through the stream. In the CPU_HAS_SIMD=0 case, _v2
is continually updated with the rolling value.
The workaround to read data generated by CPU_HAS_SIMD=1
on systems with CPU_HAS_SIMD=0
is to reset _v2 to 0 when stream_len % 16 != 0 && ip % 16 == 0 && ip + 15 > stream_len
, but the ideal fix is to initialize _v2 with the appropriate value from the bytes16 v2 when CPU_HAS_SIMD=1. Fixing it the ideal way would be backward incompatible with any previously serialized data so it seems like there's no perfect solution.
To Reproduce
- Compile the program below and statically link it to libblosc2, ensure that
CPU_HAS_SIMD == 1
inplugins/filters/bytedelta/bytedelta.c
. Rename this binary tobug_simd
. - Hardcode CPU_HAS_SIMD = 0 in
filters/bytedelta/bytedelta.c
, recompile libblosc2, and recompile and statically relink the program below. Rename this tobug_generic
. ./bug_simd 1
# Successful roundtrip data <-> schunk !./bug_generic 1
# Successful roundtrip data <-> schunk !./bug_simd
# Decompressed data differs from original 116, 231382041, 231371503!./bug_simd 1
# Successful roundtrip data <-> schunk !./bug_generic
# Decompressed data differs from original 116, 231382041, 231392835!
#include <stdio.h>
#include <blosc2.h>
#include "blosc2/filters-registry.h"
#define NCHUNKS 1
#define NTHREADS 2
#define TYPESIZE 4
#define CHUNKSIZE (TYPESIZE * 29) // (CHUNKSIZE / TYPESIZE) % 16 must != 0 to exploit this bug.
int32_t ref_data[CHUNKSIZE] = {
546303464, 823102828, 763542202, 46073357, 639725367, 831718428,
754955461, 313104751, 556923531, 902333302, 874693449, 244457120,
42880164, 677450262, 909282669, 716283249, 231382041, 835942381,
341227635, 956334593, 656239542, 588781027, 805567515, 504325858,
734152813, 220178903, 436001506, 691382423, 534977848
};
int main(int argc, char **argv) {
blosc2_init();
static int32_t data_dest[CHUNKSIZE];
int32_t isize = CHUNKSIZE * sizeof(int32_t);
blosc2_cparams cparams = BLOSC2_CPARAMS_DEFAULTS;
cparams.compcode = BLOSC_ZSTD;
cparams.filters[BLOSC2_MAX_FILTERS - 1] = BLOSC_FILTER_BYTEDELTA;
cparams.filters_meta[BLOSC2_MAX_FILTERS - 1] = 7;
blosc2_dparams dparams = BLOSC2_DPARAMS_DEFAULTS;
blosc2_schunk* schunk;
/* Create a super-chunk container */
cparams.typesize = TYPESIZE;
cparams.nthreads = NTHREADS;
dparams.nthreads = NTHREADS;
char *urlpath = (char *)"/tmp/dir1.b2frame";
blosc2_storage storage = {.contiguous=false, .urlpath=urlpath, .cparams=&cparams, .dparams=&dparams};
if (argc > 1) {
/* Remove the directory and write new data */
blosc2_remove_dir(storage.urlpath);
schunk = blosc2_schunk_new(&storage);
blosc2_schunk_append_buffer(schunk, ref_data, isize);
}
/* Read the chunks that were written before */
schunk = blosc2_schunk_open(urlpath);
blosc2_schunk_decompress_chunk(schunk, 0, data_dest, isize);
for (int i = 0; i < CHUNKSIZE; i++) {
if (data_dest[i] != ref_data[i]) {
printf("Decompressed data differs from original %d, %d, %d!\n",
CHUNKSIZE, ref_data[i], data_dest[i]);
return -1;
}
}
printf("Successful roundtrip data <-> schunk !\n");
blosc2_schunk_free(schunk);
blosc2_destroy();
return 0;
}
Expected behavior
Steps 3 through 4 each print Successful roundtrip data <-> schunk !
Logs
N/A
System information:
- OS: Ubunt 22.04
- Compiler: clang-13
- Version 0787e1b
Additional context
N/A
Yeah I guess a proper fix would be best for now (the more time passes, the harder it is to create a proper fix).