Blosc/python-blosc2

Incorrect result when packing unpacking a recarray with padding bytes

Opened this issue · 3 comments

The output data should be correct, however, some weird data are generated.

import blosc2
import numpy as np

print(blosc2.__version__)
print(np.__version__)

dtype = {
    "names": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"],
    "formats": [
        "<u8",
        "<i8",
        "<i8",
        "<u8",
        "<i4",
        "<u4",
        "<u4",
        "<i2",
        "i1",
        "i1",
        "i1",
        "<u8",
    ],
    "offsets": [0, 8, 16, 24, 32, 36, 40, 44, 46, 47, 48, 56],
    "itemsize": 64,
    "aligned": True,
}
arr = np.recarray(100, dtype=dtype)
print(type(arr), arr.dtype)
arr2 = blosc2.unpack_tensor(blosc2.pack_tensor(arr))
print(type(arr2), arr2.dtype)

Output

3.0.0b4
1.26.4
<class 'numpy.recarray'> (numpy.record, {'names': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l'], 'formats': ['<u8', '<i8', '<i8', '<u8', '<i4', '<u4', '<u4', '<i2', 'i1', 'i1', 'i1', '<u8'], 'offsets': [0, 8, 16, 24, 32, 36, 40, 44, 46, 47, 48, 56], 'itemsize': 64, 'aligned': True})
<class 'numpy.ndarray'> [('a', '<u8'), ('b', '<i8'), ('c', '<i8'), ('d', '<u8'), ('e', '<i4'), ('f', '<u4'), ('g', '<u4'), ('h', '<i2'), ('i', 'i1'), ('j', 'i1'), ('k', 'i1'), ('f11', 'V7'), ('l', '<u8')]

You can see an additional column f11 was added, what is it ?

Yes, I can reproduce this. If you can find the root of the issue, shout!

I think there are some padding bugs, if I remove these params, then the output is good:

import blosc2
import numpy as np

print(blosc2.__version__)
print(np.__version__)

dtype = {
    "names": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"],
    "formats": [
        "<u8",
        "<i8",
        "<i8",
        "<u8",
        "<i4",
        "<u4",
        "<u4",
        "<i2",
        "i1",
        "i1",
        "i1",
        "<u8",
    ],
    # "offsets": [0, 8, 16, 24, 32, 36, 40, 44, 46, 47, 48, 56],
    # "itemsize": 64,
    # "aligned": True,
}
arr = np.recarray(100, dtype=dtype)
print(type(arr), arr.dtype)
arr2 = blosc2.unpack_tensor(blosc2.pack_tensor(arr))
print(type(arr2), arr2.dtype)

The f11 field is the padding hole, shown as ?, I don't know why it becomes a column.

0        8        16       24       32       40       48       56       64
|--------|--------|--------|--------|--------|--------|--------|--------|
|aaaaaaaa|bbbbbbbb|cccccccc|dddddddd|eeeeffff|gggghhij|k???????|llllllll|