vsergeev/u-msgpack-python

msgpack not unpack unicode string correclty - should be unicode but is str

ChameleonRed opened this issue · 2 comments

It leads to problems since after such conversion - us.encode() is valid but decoded.encode() is invalid ('utf-8' - python 2.7.x). Type should be not changed during serialization.

import msgpack

us = u'Hello World!'
s = us.encode()

encoded = msgpack.packb(us)
decoded = msgpack.unpackb(encoded)

print type(us), type(decoded)

encoded = msgpack.packb(s)
decoded = msgpack.unpackb(encoded)

print type(s), type(decoded)

Perhaps this issue was meant for https://github.com/msgpack/msgpack-python?

This implementation (u-msgpack-python) behaves correctly for your test case. Substitute import umsgpack as msgpack for the import line and run:

$ python2 test.py
(<type 'unicode'>, <type 'unicode'>)
(<type 'str'>, <type 'str'>)
$

and add parentheses for print() to test it under Python3:

$ python3 test.py
<class 'str'> <class 'str'>
<class 'bytes'> <class 'bytes'>
$

You may want to take a look at https://github.com/minrk/umsgpack.