georust/rstar

Post-order element traversal

Opened this issue ยท 11 comments

Cascaded unions are a desirable feature for the geo library. However, the algorithm requires post-order traversal of the tree in order to work, but iteration order is currently not specified.

I may be out of depth here, but the data in the R-Tree is only at the leaf-nodes right? So what does a post-order traversal mean for a R-Tree? Also, does locate_with_selection_function help? It offers to selectively open each intermediate bounding box based on a custom predicate.

I think the idea is that we begin with unions of the smallest subsets (the deepest leaf nodes?) before moving up, and it's this that requires the post-order (reverse post-order, so moving from right-to-left would also work I think?); unless I'm misunderstanding that's where the efficiency comes from: the closer you get to the root, the less work there is to do.

I'm not sure how the locate-with-selection-function would help here, but I've never used it and I may well be being dense about that.

Got it. The issue is that the R-Tree itself has no user-data available at the intermediate nodes, so all the iterations have no concept of post-order.

I suppose you want to be able to do a map-reduce op on the R-Tree, where each intermediate (aka ParentNode) is reduced to a new polygon (union of its children), and do that in a bottom-up fashion. The locate-with-selection is essentially the reverse: it is a "pre-order" traversal as it queries if you intend to traverse a given intermediate node, before delving into it.

Here is one suggestion (just thinking aloud): we add a function to the trait SelectionFunction that also signals when we are done with a particular intermediate node.

trait SelectionFunction<...> {
  fn finished_unpacking_parent(...) {}
}

This can(?) then be used to drive a post-order / map-reduce op. Unfortunately, the trait SelectionFunction seems to take &self and not &mut self which seems a bit restrictive in the first-place.

Also, note that the root node, and then children of intermediate nodes can be accessed via public methods, so we can implement the union-algorithm without using the iterators.

Got it. The issue is that the R-Tree itself has no user-data available at the intermediate nodes, so all the iterations have no concept of post-order.
Ahh, yep.

I suppose you want to be able to do a map-reduce op on the R-Tree, where each intermediate (aka ParentNode) is reduced to a new polygon (union of its children), and do that in a bottom-up fashion.

This is my understanding too, yes.

Here is one suggestion (just thinking aloud): we add a function to the trait SelectionFunction that also signals when we are done with a particular intermediate node.

trait SelectionFunction<...> {

  fn finished_unpacking_parent(...) {}

}

This can(?) then be used to drive a post-order / map-reduce op. Unfortunately, the trait SelectionFunction seems to take &self and not &mut self which seems a bit restrictive in the first-place.

Don't we still have the problem of not knowing where to start, though?

The SelectionFunction already has a method should_unpack_parent that tells when it starts unpacking a parent node. That along with some nifty usage of a vec/stack should let us do it. However, it all seems more complicated than just directly recursing on the RTree nodes.

I would support introducing a new trait to support this type of map-reduce-fold if we could brain-storm a little bit. There are a couple of diff. use-cases here:

  1. Pure reduce. This is the current use-case you've mentioned iiuc; the final output is same as the R-Tree type T
  2. Map + reduce. First an intermediate T -> R on the data nodes, then reduce bottom-up to a final output R. Note that the R may not itself be an RTreeObject so it's hard to divide this up into two composable ops
  3. Fold. In iterators (std, rayon) there is also a distinction between reduce and fold. This is if leaf is not mapped from T -> R, but the penultimate intermediate nodes compute [T] -> R; the rest of the levels still compute [R] -> R. It could be rephrased as map + reduce, but the T -> R, and then [R] -> R might be more inefficient, than just directly converting from [T] -> R. This only helps at the bottom most level, so may be not worth it for trees.

Now, unfortunately the naive trait defn. I could think of to handle the above seems to involve many intermediate Vec allocations, that seems to hinder it's usage in a good algo impl. The direct recursion on the nodes seems simpler as it is more flexible.

The direct recursion on the nodes seems simpler as it is more flexible.

So something like

fn bottom_up_fold_reduce<T, S, I, F, R>(tree: &RTree<T>, mut init: I, mut fold: F, mut reduce: R) -> S
where
    T: RTreeObject,
    I: FnMut() -> S,
    F: FnMut(S, &T) -> S,
    R: FnMut(S, S) -> S,
{
    fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &mut I, fold: &mut F, reduce: &mut R) -> S
    where
        T: RTreeObject,
        I: FnMut() -> S,
        F: FnMut(S, &T) -> S,
        R: FnMut(S, S) -> S,
    {
        parent
            .children()
            .iter()
            .fold(init(), |accum, child| match child {
                RTreeNode::Leaf(value) => fold(accum, value),
                RTreeNode::Parent(parent) => {
                    let value = inner(parent, init, fold, reduce);

                    reduce(accum, value)
                }
            })
    }

    inner(tree.root(), &mut init, &mut fold, &mut reduce)
}

?

I think what is also nice about this way of doing things is that it has a straight-forward parallel version using Rayon which could come in handy for large data sets:

fn parallel_bottom_up_fold_reduce<T, S, I, F, R>(tree: &RTree<T>, init: I, fold: F, reduce: R) -> S
where
    T: RTreeObject,
    RTreeNode<T>: Send + Sync,
    S: Send,
    I: Fn() -> S + Send + Sync,
    F: Fn(S, &T) -> S + Send + Sync,
    R: Fn(S, S) -> S + Send + Sync,
{
    fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &I, fold: &F, reduce: &R) -> S
    where
        T: RTreeObject,
        RTreeNode<T>: Send + Sync,
        S: Send,
        I: Fn() -> S + Send + Sync,
        F: Fn(S, &T) -> S + Send + Sync,
        R: Fn(S, S) -> S + Send + Sync,
    {
        parent
            .children()
            .into_par_iter()
            .fold(init, |accum, child| match child {
                RTreeNode::Leaf(value) => fold(accum, value),
                RTreeNode::Parent(parent) => {
                    let value = inner(parent, init, fold, reduce);

                    reduce(accum, value)
                }
            })
            .reduce(init, reduce)
    }

    inner(tree.root(), &init, &fold, &reduce)
}

Thanks for fleshing out the details @adamreichold ! Looks quite on track; like the easy rayon support. Just a few minor comments / points to discuss:

  1. The init parameter should probably take the bounding-box AABB as ref, as it is available anyway, and might help in initializing some structures that help with the reduce/fold.

  2. Is (S, S) -> S as efficient as (&mut S, S) if S is big? Rust may be optimizing it away as we're just storing the returned value back into the accumulator for the next iteration, but I wasn't too sure.

  3. I wonder if there are use-cases where it is more efficient to directly reduce &[S] -> S. Even with polygon clipping, the underlying data-structure is better built once for the sequence than during each iterative reduction. Unfortunately, I don't see a neat way to handle that use-case without many Vec allocations.

The init parameter should probably take the bounding-box AABB as ref, as it is available anyway, and might help in initializing some structures that help with the reduce/fold.

๐Ÿ‘

Is (S, S) -> S as efficient as (&mut S, S) if S is big? Rust may be optimizing it away as we're just storing the returned value back into the accumulator for the next iteration, but I wasn't too sure.

Both std and rayon seem to pass the accumulator by value. I think, if it was really expensive to move and not optimized properly, one would box it?

I wonder if there are use-cases where it is more efficient to directly reduce &[S] -> S. Even with polygon clipping, the underlying data-structure is better built once for the sequence than during each iterative reduction. Unfortunately, I don't see a neat way to handle that use-case without many Vec allocations.

I do not know the details of those auxiliary data structures, but maybe moving them into the closures would help? (At least in the serial case. Not sure whether thread-local storage is worth that in the parallel case.)

Maybe the above could also accommodate this differently: S could keep auxiliary allocations around, e.g. a Vec<T> to delay the union computation until the next call to reduce?

In any case, I think the main take away could also be that we do not really need to make these decisions in a general context as the traversal/reduction can be written against the existing rstar API (and I don't see any obvious performance gains from having access to its internals) so when geo implements cascaded unions, it can use code like the above but tailored to its specific use case.

Or would you rather say that this is important/complex enough to be exposed by rstar itself? Should it then also optionally provide the parallel variant?