xnd-project/libxnd

[Feature] XND as a "Serialization" and "Deserialization" library

Closed this issue · 3 comments

I see a lot of use cases for XND as a "serialization" and "deserialization" library for both networking and via files (due to the fact that it only requires a memory copy). It would be worth putting thought into how XND fits into this picture.

File Representation

For example feather provides a way of sharing data that is compatible with python and R. Apache arrow provides support for feather along with a few other formats. Numpy provides support for writing to a .npy file.

I believe that XND is special in that it provides a super-set of these features.

Networking for Distributed Applications

For this It looks like direct access to the buffers and documentation on using it would be beneficial.

In Scientific computing MPI is still a dominant technology for networking. Since XND does not require any special transformations RDMA should work well and we should have great performance with MPI. I believe that XND would fit extremely well in this space because provide a generalized container for transferring the data. ADIOS tries to address this issue with xml descriptions of their data. https://www.olcf.ornl.gov/center-projects/adios/.

In the general networking space there are many technologies that compete but none of them seem well positioned for large data. Protocol buffers explicitly state that they are for small messages < 10 mb (as do many others in this space).

skrah commented

Thank you for opening this topic. Yes, I think XND would be highly useful for this. Since the types can already be serialized, it is a matter of exposing and dumping the data pointer and serializing bitmaps.

The latter can also be optional at first, i.e. it is OK to raise NotImplemented if the type contains bitmaps.

I agree that the first topic writing to a file is much more approachable and would make the second part easier once implemented. I am still learning the inner workings of XND (currently playing around with a python script and inspecting the data structure with gdb). I will look into this and approaches that could be taken.

skrah commented

Value and type are now serialized together in a single string:

>>> x = xnd([[1, 2], [3, 4]])
>>> s = x.serialize()
>>> s
b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x03\x01\x00\x00\x00\x00\x02\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x03\x01\x00\x00\x00\x00\x01\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x1b\x01\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x08\x00 \x00\x00\x00\x00\x00\x00\x00'
>>> x.deserialize(s)
xnd([[1, 2], [3, 4]], type='2 * 2 * int64')