msgpack not unpack unicode string correclty - should be unicode but is str
ChameleonRed opened this issue · 2 comments
ChameleonRed commented
It leads to problems since after such conversion - us.encode() is valid but decoded.encode() is invalid ('utf-8' - python 2.7.x). Type should be not changed during serialization.
import msgpack
us = u'Hello World!'
s = us.encode()
encoded = msgpack.packb(us)
decoded = msgpack.unpackb(encoded)
print type(us), type(decoded)
encoded = msgpack.packb(s)
decoded = msgpack.unpackb(encoded)
print type(s), type(decoded)
vsergeev commented
Perhaps this issue was meant for https://github.com/msgpack/msgpack-python?
This implementation (u-msgpack-python) behaves correctly for your test case. Substitute import umsgpack as msgpack
for the import line and run:
$ python2 test.py
(<type 'unicode'>, <type 'unicode'>)
(<type 'str'>, <type 'str'>)
$
and add parentheses for print()
to test it under Python3:
$ python3 test.py
<class 'str'> <class 'str'>
<class 'bytes'> <class 'bytes'>
$
vsergeev commented
You may want to take a look at https://github.com/minrk/umsgpack.