rdaum/rart-rs

Iter and range methods don't work

Opened this issue · 5 comments

Thanks for this excellent librarry!

I noticed that the iter and range methods don't work currently because only the leaf nodes are being iterated over, and not the inner nodes. Just curious if you plan to resolve this

rdaum commented

Thanks for poking at this and finding that. It doesn't surprise me they are broken, unfortunately; they've been on my list of things to poke at for a while because I'm not happy with the way they're implemented. Right now they'll do a bunch of excessive copies during iteration, and defeats half the purpose of the ART structure, which is fast sequential operations.

I would definitely like to fix this, when I get the time, not likely to be this week.

In the meantime, if you have a reproduction case, even a small fragment of code, feel free to provide. Regression tests more than welcome.

rdaum commented

So I've fixed .iter(), but .range() remains broken for now, pending more serious surgery.

The TLDR is that there's issues with the comparison of key values as passed in to .insert, .range, etc. and with the Partial representations that end up stored in the tree. When iterating we have to reconstruct the key bytes based on the various partials in the tree. With range the reconstructed keys need to be compared to the begin/end keys passed in for the Range, so that we can determine if we've hit the end (or start) of the range. Right now this is broken.

At some point hopefully soon I am going to rework how keys are produced, and change it so that users have to pass in a couple functors for key->bytes and partials->key conversion when making the tree. I think this will allow me to fix the range stuff properly.

There was also an issue with iteration and sort order. At one point I changed one of the child node mappings so that it wasn't internally sorted -- but instead by insertion order -- which had a fairly minor but real performance improvement. But it means that the tests that compared iteration order in the ART vs iteration order in a BTree were unstable. I've switched back to the SortedKeyedMapping for now, but will likely make it an option for the user as soon as I can get around to that refactoring.

@rdaum Do you have a sense for how complicated the key rework might be? I have a use case that might be a good fit for rart but I can't benchmark it without range, so I was wondering if I might be able to help somehow.

That sounds interesting, thank you! I'll take a look.

Maybe I could explain my use case in case you have any other ideas - maybe I could skip range somehow. I currently have a BTreeMap<u32, u32> but it's causing performance problems on range queries/insertion/removal (especially range queries, so some trade-off would probably be ok).

I'm interested in some kind of radix tree to represent the same thing but with fast mutable range searches, e.g., "give me all keys bounded by a range, maybe remove some nodes, then insert some more nodes" My thought was to write my own tree and organize the data around SIMD lanes somehow to accelerate the search, but it looks like rart already does most of that so maybe I could help get it to the point I'd like.

For what it's worth, I also know roughly how the keys will be distributed (over 0 to u32::MAX) and how far queries would span over those shards, so it's possible to bucket/shard them across multiple trees if that could help in some way (e.g., rart per shard and linear scans, although maybe that's a bad idea).