msgpack/msgpack-python

Original data size is greater than deserialized one

YarShev opened this issue · 1 comments

Hi there,

Trying to serialize/deserialize an object using pickle 5 protocol. Having looked at the data size of the object before serialization and after deserialization, I wonder why the original data size is greater than deserialized one? Is that an issue or expected behavior? Note that this is a simplified example. In the original code serialization takes place in one process and deserialization in other to exchange the data between processes.

import numpy as np
import msgpack
import pickle as pkl
import sys


buffers = []

def callback(pickle_buffer):
    buffers.append(pickle_buffer)
    return False

def encode(data):
    packed_data = pkl.dumps(data, protocol=5, buffer_callback=callback)
    return {"__pickle5_custom__": True, "as_bytes": packed_data}

def decode(packed_data):
    return pkl.loads(packed_data["as_bytes"], buffers=buffers)

array = np.array([1, 2, 3] * 10000)

packed_data = msgpack.packb(array, default=encode, strict_types=True)
unpacked_data = msgpack.unpackb(packed_data, object_hook=decode, strict_map_key=False)

print(sys.getsizeof(array))
240112
print(sys.getsizeof(unpacked_data))
112
print(all(np.equal(array, unpacked_data)))
True

Thanks in advance.

It is totally unrelating to this project. That behavior can be reproducible without msgpack.

-packed_data = msgpack.packb(array, default=encode, strict_types=True)
-unpacked_data = msgpack.unpackb(packed_data, object_hook=decode, strict_map_key=False)
+encoded = encode(array)
+unpacked_data = decode(encoded)