Allow, at least as an option, copyless encoding

Question

Allow, at least as an option, copyless encoding

Closed this issue 6 years ago · 3 comments

The 'encode' function uses the numpy 'tobytes' method, which makes a copy of the data.
(See the documentation here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tobytes.html)

While this is perfect for most use cases, we are using this module in a high-throughput data processing environment, and copyless encoding would allow our code to be faster and use less memory. This could be achieved by exposing the underlying buffer from a
numpy array. If a is 'a' numpy array, 'a.data' can substitute 'a.tobytes()'.

Thank you very much for your library and your work!

Valmar

Answer 1 · 2018-09-20T02:39:31.000Z

I tried implementing this in the copyless branch. Speed improvements in serialization seem rather small, though. Let me know if this is what you were thinking of - if so, I'll merge it into master.

Answer 2 · 2018-09-24T09:51:49.000Z

Thanks. This is exactly what I was thinking about. I am a scientist in the field of x-ray diffraction science. We use msgpack (and your library) at scientific facilities like the European XFEL: https://www.xfel.eu/

We pipe through msgpack and (and ZMQ) high resolution image data with a very high throughput (hundreds of 1024x1024 float numpy array per second), so anything we can save in memory and speed quickly piles up!

Thanks again!

Answer 3 · 2018-09-26T01:45:41.000Z

Since both the copyless and non-copyless implementations are non-destructive, there didn't seem to be much reason continue to maintain the latter. I therefore switched over to the copyless implementation entirely in the latest version.

Glad that this little package is making your research easier!