lebedov/msgpack-numpy

Allow, at least as an option, copyless encoding

Closed this issue · 3 comments

The 'encode' function uses the numpy 'tobytes' method, which makes a copy of the data.
(See the documentation here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tobytes.html)

While this is perfect for most use cases, we are using this module in a high-throughput data processing environment, and copyless encoding would allow our code to be faster and use less memory. This could be achieved by exposing the underlying buffer from a
numpy array. If a is 'a' numpy array, 'a.data' can substitute 'a.tobytes()'.

Thank you very much for your library and your work!

Valmar

I tried implementing this in the copyless branch. Speed improvements in serialization seem rather small, though. Let me know if this is what you were thinking of - if so, I'll merge it into master.

Thanks. This is exactly what I was thinking about. I am a scientist in the field of x-ray diffraction science. We use msgpack (and your library) at scientific facilities like the European XFEL: https://www.xfel.eu/

We pipe through msgpack and (and ZMQ) high resolution image data with a very high throughput (hundreds of 1024x1024 float numpy array per second), so anything we can save in memory and speed quickly piles up!

Thanks again!

Since both the copyless and non-copyless implementations are non-destructive, there didn't seem to be much reason continue to maintain the latter. I therefore switched over to the copyless implementation entirely in the latest version.

Glad that this little package is making your research easier!