sdd/kiddo

zero-copy with `rkyv`

Closed this issue ยท 11 comments

If I understand correctly, right now the rkyv impl isn't actually zero copy. I don't know if this is how it is usually done, but it should be possible to essentially copy paste the tree methods to implement them on ArchivedKdTree (with a few changes) so that you could query the ArchivedKdTree directly. This will dramatically decrease time-to-first-query when deserializing, but may add some pointer indirections and slightly slow down queries.

sdd commented

I'm pretty sure it is zero-copy as there would be no way that it would work as fast as it does right now if it weren't - it essentially just casts a pointer the archived file contents back to a KDTree as the memory layout is identical. See the implementation in the example here https://github.com/sdd/kiddo/blob/master/examples/rkyv.rs and the associated rkyv documentation at https://docs.rs/rkyv/latest/rkyv/trait.Archive.html.

The ArchivedKdTree type never needs to be used directly. The buffer containing the serialized tree gets transmuted into a KDTree by the rkyv macros without needing to refer to ArchivedKdTree at all.

(this is not quite the full story with the Fixed KDTree as the Fixed values don't support rkyv. But even then, the Fixed KDTree can just be transmutes to a KdTreeRK, which has eg u16s instead of Fixed16s, which can then be serialized by Rkyv, and the reverse sequence of events done during deserialization).

I'm โ‰ˆ75% certain of my statement. As mentioned in the rkyv book, Deserialize is not zero copy:

This provides a more or less a traditional deserialization with the added benefit of being sped up somewhat by having very compatible representations. It also incurs both the memory and performance penalties of traditional deserialization, so make sure that it's what you need before you use it. Deserialization is not required to access archived data as long as you can do so through the archived versions.

A good use for Deserialize is deserializing portions of archives. You can easily traverse the archived data to locate some subobject, then deserialize just that piece instead of the archive as a whole. This granular approach provides the benefits of both zero-copy deserialization as well as traditional deserialization.

With the ArchivedKdTree, you should be able to access the inner fields containing all the ArchivedLeafNode and ArchivedStemNode for queries. In theory, zero-copy deser of a data structure like this should be basically loading the data and then a few hundred nanoseconds of making pointers -- in your example the deserialize method takes orders of magnitude longer than that.

sdd commented

Aah - I see what you mean now, thanks. Interesting that I was able to get such a significant speedup switching from serde to rkyv before even though I was still using rkyv's deserialize. That blinded me to the fact that I was still not fully realising the potential of rkyv.

I'll implement some of the queries on ArchivedKdTree in that case and add some tests and benchmarks for that.

sdd commented

You were totally right :-)

I've tried implementing nearest_one on the float ArchivedKdTree and then updating the rkyv example to show the time-to-first-query-result for loading in a serialized tree in the following 4 scenarios:

  • no mmap, no zero-copy
  • mmap, no zero-copy
  • no mmap, zero-copy
  • mmap, zero-copy

Here are the results on my AMD Ryzen 9 5900X, for a 11M item tree that takes up 314Mb on disk:

image

You can see the changes and / or try for yourself here: #39

And this includes the memmapping or file loading etc.

The actual zc deser , i.e. archived_root, takes like zero time. Here it is for a 256^3 point tree I tried:

it takes 63 millis to load file into mem
it takes 750 nanos to archived_root
it takes 63 millis to deserialize tree

I had one question about the archive size. 512^3 [f32; 3] and 512^3 u64s give me โ‰ˆ2.6GB of data, but the archive is almost 4GB. That's quite the inflation. I assume it's due to the unused leaf/bucket space? Maybe we should do a custom ser/de for the leaf nodes?

sdd commented

Yes, a lot of that will be slack space in the leaf nodes as they will have an average utilisation of about 75% full. You'll have some trade-offs: any custom ser/de that changes the memory layout vs ArchivedKdTree will of course no longer be zero-copy.

One thing that could be tried is simply setting the contents of the empty space in the leaf nodes to all zeroes and then gzipping the rkyv file once it has been generated. The zeroing out should make the slack space more compressible.

sdd commented

I've made some progress this evening with implementing the query methods on ArchivedKdTree. I used macros so that there is no need to duplicate the code for the method bodies, like here: https://github.com/sdd/kiddo/pull/39/files#diff-77b09b6068234623cf62c9dc9efce807affd09f1af36dd806f9dad5d64974dd0R157-R189

I'm still working on ArchivedKdTree::load_from_file and KdTree::save_to_file methods to make loading and saving with both zero-copy and memmap more ergonomic, but I'm done for the night and will pick this up again another day.

That is quite the undertaking. Bravo.

Hopefully rkyv will make macro tools in the future to simplify this stuff.

sdd commented

The ArchivedKdTree::load_from_file and KdTree::save_to_file methods are proving tricky due to lifetime / ownership issues, so I'll defer them to another PR - but I've merged in an initial PR that implements the query methods on the float ArchivedKdTree.

sdd commented

I've released these changes as https://crates.io/crates/kiddo/2.1.0 ๐Ÿ‘๐Ÿผ

I'll close this, and address the load / save as soon as I get it working.

Thanks for pointing out my mistake with the deserialisation - it's super fast now! ๐Ÿ˜„