Multi-dimensional memoryview breaks packing
pohlt opened this issue · 11 comments
NumPy 1.24 seems to use multi-dimensional memoryviews when getting array data with array.data
. msgpack does not play nicely with such multi-dimensional views.
Example
Here's a crashing example without NumPy, but creating a multi-dimensional memoryview by hand:
import msgpack
def mutli_dim_memoryview():
view = memoryview(b"\00" * 6)
for data in [
view, # fine
view.cast(view.format, (3, 2)) # crashes when unpacking
]:
print("view shape:", data.shape)
packed = msgpack.packb(data)
print("packed:", packed)
unpacked = msgpack.unpackb(packed)
print("unpacked:", unpacked)
Output
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
[...]
unpacked = msgpack.unpackb(packed)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.
Mitigations
- Casting the view w/o a shape creates a 1D view without re-allocating memory (see Python docs):
data.case(data.format)
- Generate
bytes
either bydata.tobytes()
orbytes(data)
. I'm not sure if that duplicates the memory buffer.
Of course, it would be great if msgpack could do that internally when packing a memoryview.
Thanks for this great package! ❤️
The packer actually requests a buffer object with the PyBUF_SIMPLE
flag, so there should be no shape, strides, or suboffsets. See here. The API also says that "Without stride information, the buffer must be C-contiguous."
As long as I'm not misunderstanding something here Numpy should either return a 1D C-contiguous buffer (possibly creating a copy of non-contiguous arrays first), or return an error if it can't do that. So maybe Numpy is at fault here?
@jfolz, I don't fully understand your comment.
When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data
, so I don't blame NumPy for returning a multi-dimensional memoryview.
But even without NumPy: msgpack packs multi-dimensional memoryviews incorrectly as shown in the example, wouldn't you agree?
I can not reproduce.
$ python -VV
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
$ pip freeze | grep msgpack
msgpack==1.0.4
$ cat x.py
import msgpack
def multi_dim_memoryview():
view = memoryview(b"\00" * 6)
for data in [
view, # fine
view.cast(view.format, (3, 2)) # crashes when unpacking
]:
print("view shape:", data.shape)
packed = msgpack.packb(data)
print("packed:", packed)
unpacked = msgpack.unpackb(packed)
print("unpacked:", unpacked)
multi_dim_memoryview()
$ python x.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
I tested Python 3.10 and 3.11 but output is same.
When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using
ndarray.data
, so I don't blame NumPy for returning a multi-dimensional memoryview.
You're right, I misunderstood what you were trying to do. Numpy is not to blame, as it's job is done after you get the memoryview
object out of it. My point still stands though, if the memoryview
cannot produce a nice and simple contiguous buffer object it should return an error and packing should in turn fail.
@methane , that's weird. I reproduced your steps above in a fresh venv with only msgpack installed:
$ uname -a
Linux big 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ python -VV
Python 3.11.1 (main, Dec 7 2022, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)]
$ pip freeze | grep msgpack
msgpack==1.0.4
$ cat msgpack_test.py
import msgpack
def multi_dim_memoryview():
view = memoryview(b"\00" * 6)
for data in [
view, # fine
view.cast(view.format, (3, 2)) # crashes when unpacking
]:
print("view shape:", data.shape)
packed = msgpack.packb(data)
print("packed:", packed)
unpacked = msgpack.unpackb(packed)
print("unpacked:", unpacked)
multi_dim_memoryview()
$ python msgpack_test.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
File "/home/xxx/msgpack_test.py", line 15, in <module>
multi_dim_memoryview()
File "/home/xxx/msgpack_test.py", line 11, in multi_dim_memoryview
unpacked = msgpack.unpackb(packed)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.
As you can see, the bin array has a correct size of 6 in the first case and an incorrect size 3 (the size of the first dimension) in the second case.
@jfolz The memoryview is contiguous, but it's multi-dimensional. The question is what msgpack should do with multi-dimensional memoryviews:
- Treat it as a flat memoryview and pack and unpack it like that? This would mean that the unpacked memoryview is slightly different: same data, but different shape (flattened). I think that's the current behavior (at least for @methane who currently cannot reproduce the issue).
- Refuse packing a multi-dimensional memoryview? Ok, but impractical, IMHO.
Could be another configuration parameter.
Now I got it. Fallback module packs it wrong.
(venv) [root@978f0846d90b ~]# MSGPACK_PUREPYTHON=1 python mptest.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
File "/root/mptest.py", line 15, in <module>
multi_dim_memoryview()
File "/root/mptest.py", line 12, in multi_dim_memoryview
unpacked = msgpack.unpackb(packed)
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.
I released 1.0.5rc1. Please try it.
Issue is fixed with 1.0.5rc1. Thanks a lot!
But since you now provide wheels for 3.11, I'm no longer using the fallback code, I guess.