UnicodeDecodeError when decoding Numpy types
Closed this issue · 1 comments
Tronic commented
This module should not override the msgpack default parameters with use_bin_type=0
and raw=True
like it does. This causes UnicodeDecodeErrors and will also mix up str and bytes types elsewhere. Manually specifying use_bin_type=True
and raw=False
avoids the problems:
In [1]: import msgpack_numpy as mp, numpy as np
In [2]: mp.packb(np.array([-0.0]))
Out[2]: b'\x85\xa2nd\xc3\xa4type\xa3<f8\xa4kind\xa0\xa5shape\x91\x01\xa4data\xa8\x00\x00\x00\x00\x00\x00\x00\x80'
In [3]: mp.unpackb(_)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 7: invalid start byte
In [4]: mp.packb(np.array([-0.0]), use_bin_type=True)
Out[4]: b'\x85\xc4\x02nd\xc3\xc4\x04type\xa3<f8\xc4\x04kind\xc4\x00\xc4\x05shape\x91\x01\xc4\x04data\xc4\x08\x00\x00\x00\x00\x00\x00\x00\x80'
In [5]: mp.unpackb(_, raw=False)
Out[5]: array([-0.])
lebedov commented
The msgpack defaults were explicitly changed in msgpack-numpy to provide more seamless behavior for both Python 2 and 3 in light of the differences in how they each handle string types. Given that Python 2 is now EOL, it seems safer to revert to the msgpack defaults.