georust/rstar

attempt to add with overflow

stepancheg opened this issue · 6 comments

Probably related to #138.

Min repro:

#[cfg(test)]
mod tests {
    use rstar::RTree;

    #[test]
    fn test() {
        let mut rtree = RTree::new();

        rtree.insert([-1231069110, 165759396]);
        rtree.insert([781046969, 1474650861]);
        rtree.insert([2044084841, -178727451]);
        rtree.insert([477217386, 1765868819]);
        rtree.insert([1257259285, -1226428015]);
        rtree.insert([-1802029933, -488481506]);
        rtree.insert([-1786107855, 2050884187]);
    }
}

Integer overflow is on this line:

self.lower.component_wise(&self.upper, |x, y| (x + y) / two)

The fix is probably to write this as:

x + (y - x) / two

Unless I'm misunderstanding your suggestion, your fix + test combo results in an error in debug mode: attempt to subtract with overflow: master...overflowfix

Yes!

w14 commented

What is the solution to this?

I think there is none so far besides using a larger/signed data type.

I suspect a fix would entail inline the computation of the envelope center into the closure used by get_nodes_for_reinsertion (because the centers of children are necessary in bounds to the envelope of their parent), but this does not really work as Envelope::center is a trait method where we cannot rely on a general implementation.

Maybe we can introduce something like Envelope::sort_envelopes_center in analogy to the existing Envelope::sort_envelopes which would default to the implement currently hard-coded into get_nodes_for_reinsertion? Let me give that a try...

I don't think this will work but also don't think for the test case itself, anything but widening it to i64 can be done.

So I replaced the body of get_nodes_for_reinsertion with a new trait method on Envelope, i.e.

fn sort_envelopes_by_center<T: RTreeObject<Envelope = Self>>(&self, envelopes: &mut [T]) {
    let one = <Self::Point as Point>::Scalar::one();
    let two = one + one;

    envelopes.sort_by(|l, r| {
        let l = l.envelope();
        let r = r.envelope();

        let l = Self::Point::generate(|axis| {
            let x1 = l.lower.nth(axis);
            let y1 = l.upper.nth(axis);
            let x2 = self.lower.nth(axis);
            let y2 = self.upper.nth(axis);

            // x2 <= x1 <= y1 <= y2
            // (x1 + y1) / two - (x2 + y2) / two
            // x1 + (y1 - x1) / two - x2 - (y2 - x2) / two

            (x1 / two - x2 / two) + (y1 / two - y2 / two)
        })
        .length_2();

        let r = Self::Point::generate(|axis| {
            let x1 = r.lower.nth(axis);
            let y1 = r.upper.nth(axis);
            let x2 = self.lower.nth(axis);
            let y2 = self.upper.nth(axis);

            (x1 / two - x2 / two) + (y1 / two - y2 / two)
        })
        .length_2();

        l.partial_cmp(&r).unwrap()
    });
}

and this does actually through computing the center distance coordinates, only to blow up on the accumulator later when computing length_2.

That said, the given points use just too much of the range of i32, i.e not even the pairwise differences can be represented in i32 meaning that already

let points = [
    [-1231069110, 165759396],
    [781046969, 1474650861],
    [2044084841, -178727451],
    [477217386, 1765868819],
    [1257259285, -1226428015],
    [-1802029933, -488481506],
    [-1786107855, 2050884187],
];

for l in points {
    for r in points {
        l.sub(&r);
    }
}

will overflow which to me suggests there is nothing that can be done but widening the coordinates to i64.

@w14 I think you could still give the branch sort-envelopes-by-center a try because it should avoid intermediate negative coordinates, e.g. use a Git dependency like

rstar = { git = "https://github.com/georust/rstar.git", branch = "sort-envelopes-by-center" }

but I don't think we can commit this implementation as it requires more operations than required in the common floating point case. We might want to provide the trait method (but without a an implementation that is different from what we have now) though so that it can be overwritten in downstream crates which want to use unsigned integers though?

To be honest, I think the only reasonable course of action for this particular issue to close this as requiring a larger coordinate type like i64.