Blosc/c-blosc2

inline function `swap_store` in `blosc2.h`

DimitriPapadopoulos opened this issue · 12 comments

Describe the bug

Why is the inline function swap_store part of the API?

It is mostly used for int32_t and int64_t. In that case, calls to malloc and memcpy are overkill. Just use:

uint32_t swap32(uint32_t x) {
  return ((x & 0x000000FF) << 24) | ((x & 0x0000FF00) <<  8) |
         ((x & 0x00FF0000) >>  8) | ((x & 0xFF000000) >> 24);
}

uint64_t swap64(uint64_t x) {
  return ((x & 0x00000000000000FF) << 56) | ((x & 0x000000000000FF00) << 40) | ((x & 0x0000000000FF0000) << 24) | ((x & 0x00000000FF000000) <<  8) | 
         ((x & 0x000000FF00000000) >>  8) | ((x & 0x0000FF0000000000) >> 24) | ((x & 0x00FF000000000000) >> 40) | ((x & 0xFF00000000000000) >> 56);
}

Actually, you should use compiler intrinsics if they exist:

2 bytes 4 bytes 8 bytes
MSVC _byteswap_ushort _byteswap_ulong _byteswap_uint64
GCC __builtin_bswap16 __builtin_bswap32 __builtin_bswap64
Clang __builtin_bswap16 __builtin_bswap32 __builtin_bswap64

To Reproduce

c-blosc2/include/blosc2.h

Lines 2121 to 2127 in 65a1a30

static inline void swap_store(void *dest, const void *pa, int size) {
uint8_t *pa_ = (uint8_t *) pa;
uint8_t *pa2_ = (uint8_t*)malloc((size_t) size);
int i = 1; /* for big/little endian detection */
char *p = (char *) &i;
if (p[0] == 1) {

Expected behavior

Remove from the API.

Rewrite using compiler intrinsics and fall back to the above piece of code.

Additional context

See xxHash implementation:

/*!
 * @internal
 * @fn xxh_u32 XXH_swap32(xxh_u32 x)
 * @brief A 32-bit byteswap.
 *
 * @param x The 32-bit integer to byteswap.
 * @return @p x, byteswapped.
 */
#if defined(_MSC_VER)     /* Visual Studio */
#  define XXH_swap32 _byteswap_ulong
#elif XXH_GCC_VERSION >= 403
#  define XXH_swap32 __builtin_bswap32
#else
static xxh_u32 XXH_swap32 (xxh_u32 x)
{
    return  ((x << 24) & 0xff000000 ) |
            ((x <<  8) & 0x00ff0000 ) |
            ((x >>  8) & 0x0000ff00 ) |
            ((x >> 24) & 0x000000ff );
}
#endif

Actually, you already have such functions:

/* Reverse swap bits in a 32-bit integer */
static inline int32_t bswap32_(int32_t a) {
#if defined (__GNUC__)
return __builtin_bswap32(a);
#elif defined (_MSC_VER) /* Visual Studio */
return _byteswap_ulong(a);
#else
a = ((a & 0x000000FF) << 24) |
((a & 0x0000FF00) << 8) |
((a & 0x00FF0000) >> 8) |
((a & 0xFF000000) >> 24);
return a;
#endif
}

You even have a similar endian_handler function:

static inline void endian_handler(bool little, void *dest, const void *pa, int size)
{
bool little_endian = is_little_endian();
if (little_endian == little) {
memcpy(dest, pa, size);
}
else {
uint8_t* pa_ = (uint8_t*)pa;
uint8_t pa2_[8];
switch (size) {
case 8:
pa2_[0] = pa_[7];
pa2_[1] = pa_[6];
pa2_[2] = pa_[5];
pa2_[3] = pa_[4];
pa2_[4] = pa_[3];
pa2_[5] = pa_[2];
pa2_[6] = pa_[1];
pa2_[7] = pa_[0];
break;
case 4:
pa2_[0] = pa_[3];
pa2_[1] = pa_[2];
pa2_[2] = pa_[1];
pa2_[3] = pa_[0];
break;
case 2:
pa2_[0] = pa_[1];
pa2_[1] = pa_[0];
break;
case 1:
pa2_[0] = pa_[0];
break;
default:
BLOSC_TRACE_ERROR("Unhandled size: %d.", size);
}
memcpy(dest, pa2_, size);
}
}

IIRC we made this in order to avoid replicating the code in codecs, which do not have access to the blosc2 internals. See e.g. 8359e6d

I see. How about moving that code to plugins/plugin_utils.h?

I see. How about moving that code to plugins/plugin_utils.h?

+1

Also, swap_store is used only on 32- and 64-bits integers:

$ grep -R swap_store
blosc/b2nd.c:    swap_store(pmeta, shape + i, sizeof(int64_t));
blosc/b2nd.c:    swap_store(pmeta, chunkshape + i, sizeof(int32_t));
blosc/b2nd.c:    swap_store(pmeta, blockshape + i, sizeof(int32_t));
blosc/b2nd.c:  swap_store(pmeta, &dtype_len, sizeof(int32_t));
plugins/codecs/zfp/blosc2-zfp.c:          swap_store(blockmeta + i, pmeta, sizeof(int32_t));
plugins/plugin_utils.c:    swap_store(shape + i, pmeta, sizeof(int64_t));
plugins/plugin_utils.c:    swap_store(chunkshape + i, pmeta, sizeof(int32_t));
plugins/plugin_utils.c:    swap_store(blockshape + i, pmeta, sizeof(int32_t));
include/b2nd.h:    swap_store(shape + i, pmeta, sizeof(int64_t));
include/b2nd.h:    swap_store(chunkshape + i, pmeta, sizeof(int32_t));
include/b2nd.h:    swap_store(blockshape + i, pmeta, sizeof(int32_t));
include/b2nd.h:    swap_store(&dtype_len, pmeta, sizeof(int32_t));
include/blosc2.h:static inline void swap_store(void *dest, const void *pa, int size) {
$ 

I plan on adding specialised functions for those types only (and maybe 16-bits integers just in case).

Actually, (some) codecs do access blosc2 internals – at the very least they include headers from blosc.

For example, while compiling plugins/codecs/zfp/blosc2-zfp.c, the compiler complains about redefinitions of the new swap functions in plugins/plugin_utils.h, with the previous definitions in blosc/blosc-private.h. That's because plugins/codecs/zfp/blosc2-zfp.c includes private blosc header files:

#include "blosc-private.h"
#include "frame.h"
#include "blosc2/codecs-registry.h"
#include "zfp.h"
#include "blosc2-zfp.h"
#include <math.h>
#include "context.h"

What do you mean exactly by “codecs [...] do not have access to the blosc2 internals”?

I don't remember exactly how things went, but probably we've made swap_store public without thinking too much. Having said that, recently we have introduced the concept of dynamic plugins, and I'd say that swap_store in blosc2.h can be good for them.

Now that I look into the code, a possibility is to move these functions to include/blosc2/blosc2-common.h which is distributed in binary (devel) packages, and that would be good for plugin developers?

So, it would be part of the API after all.

I will move/merge everything from include/blosc2.h and blosc/blosc-private.h to include/blosc2/blosc2-common.h.

These two functions are (almost) identical:

  • swap_store() from include/blosc2.h
  • endian_handler() from blosc/blosc-private.h

These functions are really strange, it feels like they were written as an example not to follow:

  • they were written as if they could swap any number of bytes,
  • yet they swap only 1, 2, 4 or 8 bytes,
  • the fail to optimise these few supported cases with compiler intrinsics,
  • they allocate memory without reason as far as I can see,
  • when they fail, they leave rubbish behind and just output the message Unhandled nitems.

I'll see if we can get rid of them, and stick to functions specialised for 2, 4 or 8 bytes.

Any progress on this one?

I can't recall why this issue is stalled. I guess I'm not entirely certain how to fix this. Ideally, these functions should be removed from the API.