Original data size is greater than deserialized one
YarShev opened this issue · 1 comments
YarShev commented
Hi there,
Trying to serialize/deserialize an object using pickle 5 protocol. Having looked at the data size of the object before serialization and after deserialization, I wonder why the original data size is greater than deserialized one? Is that an issue or expected behavior? Note that this is a simplified example. In the original code serialization takes place in one process and deserialization in other to exchange the data between processes.
import numpy as np
import msgpack
import pickle as pkl
import sys
buffers = []
def callback(pickle_buffer):
buffers.append(pickle_buffer)
return False
def encode(data):
packed_data = pkl.dumps(data, protocol=5, buffer_callback=callback)
return {"__pickle5_custom__": True, "as_bytes": packed_data}
def decode(packed_data):
return pkl.loads(packed_data["as_bytes"], buffers=buffers)
array = np.array([1, 2, 3] * 10000)
packed_data = msgpack.packb(array, default=encode, strict_types=True)
unpacked_data = msgpack.unpackb(packed_data, object_hook=decode, strict_map_key=False)
print(sys.getsizeof(array))
240112
print(sys.getsizeof(unpacked_data))
112
print(all(np.equal(array, unpacked_data)))
True
Thanks in advance.
methane commented
It is totally unrelating to this project. That behavior can be reproducible without msgpack.
-packed_data = msgpack.packb(array, default=encode, strict_types=True)
-unpacked_data = msgpack.unpackb(packed_data, object_hook=decode, strict_map_key=False)
+encoded = encode(array)
+unpacked_data = decode(encoded)