msgpack/msgpack-python

Multi-dimensional memoryview breaks packing

pohlt opened this issue · 11 comments

pohlt commented

NumPy 1.24 seems to use multi-dimensional memoryviews when getting array data with array.data. msgpack does not play nicely with such multi-dimensional views.

Example

Here's a crashing example without NumPy, but creating a multi-dimensional memoryview by hand:

import msgpack

def mutli_dim_memoryview():

    view = memoryview(b"\00" * 6)
    for data in [
        view,  # fine
        view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

Output

view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
    [...]
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Mitigations

  • Casting the view w/o a shape creates a 1D view without re-allocating memory (see Python docs): data.case(data.format)
  • Generate bytes either by data.tobytes() or bytes(data). I'm not sure if that duplicates the memory buffer.

Of course, it would be great if msgpack could do that internally when packing a memoryview.

Thanks for this great package! ❤️

jfolz commented

The packer actually requests a buffer object with the PyBUF_SIMPLE flag, so there should be no shape, strides, or suboffsets. See here. The API also says that "Without stride information, the buffer must be C-contiguous."
As long as I'm not misunderstanding something here Numpy should either return a 1D C-contiguous buffer (possibly creating a copy of non-contiguous arrays first), or return an error if it can't do that. So maybe Numpy is at fault here?

pohlt commented

@jfolz, I don't fully understand your comment.

When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data, so I don't blame NumPy for returning a multi-dimensional memoryview.

But even without NumPy: msgpack packs multi-dimensional memoryviews incorrectly as shown in the example, wouldn't you agree?

I can not reproduce.

$ python -VV
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]

$ pip freeze | grep msgpack
msgpack==1.0.4

$ cat x.py
import msgpack

def multi_dim_memoryview():

    view = memoryview(b"\00" * 6)
    for data in [
        view,  # fine
        view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

multi_dim_memoryview()

$ python x.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'

I tested Python 3.10 and 3.11 but output is same.

jfolz commented

When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data, so I don't blame NumPy for returning a multi-dimensional memoryview.

You're right, I misunderstood what you were trying to do. Numpy is not to blame, as it's job is done after you get the memoryview object out of it. My point still stands though, if the memoryview cannot produce a nice and simple contiguous buffer object it should return an error and packing should in turn fail.

pohlt commented

@methane , that's weird. I reproduced your steps above in a fresh venv with only msgpack installed:

$ uname -a
Linux big 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python -VV
Python 3.11.1 (main, Dec  7 2022, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)]

$ pip freeze | grep msgpack
msgpack==1.0.4

$ cat msgpack_test.py 
import msgpack

def multi_dim_memoryview():
    view = memoryview(b"\00" * 6)
    for data in [
      view,  # fine
      view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

multi_dim_memoryview()

$ python msgpack_test.py 
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/home/xxx/msgpack_test.py", line 15, in <module>
    multi_dim_memoryview()
  File "/home/xxx/msgpack_test.py", line 11, in multi_dim_memoryview
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

As you can see, the bin array has a correct size of 6 in the first case and an incorrect size 3 (the size of the first dimension) in the second case.

pohlt commented

@jfolz The memoryview is contiguous, but it's multi-dimensional. The question is what msgpack should do with multi-dimensional memoryviews:

  • Treat it as a flat memoryview and pack and unpack it like that? This would mean that the unpacked memoryview is slightly different: same data, but different shape (flattened). I think that's the current behavior (at least for @methane who currently cannot reproduce the issue).
  • Refuse packing a multi-dimensional memoryview? Ok, but impractical, IMHO.

Could be another configuration parameter.

Now I got it. Fallback module packs it wrong.

(venv) [root@978f0846d90b ~]# MSGPACK_PUREPYTHON=1 python mptest.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/root/mptest.py", line 15, in <module>
    multi_dim_memoryview()
  File "/root/mptest.py", line 12, in multi_dim_memoryview
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

I released 1.0.5rc1. Please try it.

pohlt commented

Issue is fixed with 1.0.5rc1. Thanks a lot!

pohlt commented

But since you now provide wheels for 3.11, I'm no longer using the fallback code, I guess.