xarray-contrib/xoak

Add k-d tree index

Opened this issue · 4 comments

It would be interesting to test/benchmark the ball tree + haversine approach versus the kd-tree + conversion lat/lon spherical -> xyz cartersian.

The second approach is also used by libpysal, see here.

Since we already rely on scikit-learn for the ball tree, would it make sense to reuse scikit-learn's KDTree instead of Scipy's cKDTree?

would it make sense to reuse scikit-learn's KDTree instead of Scipy's cKDTree?

Yes, definitely!

It would be interesting to test/benchmark the ball tree + haversine approach versus the kd-tree + conversion lat/lon spherical -> xyz cartersian.

I'm not sure I understand why we should not also do the ball tree benchmark with lat/lon to spherical and the kd-tree with haversine. I'd go for a matrix like

tree:
  - ball tree
  - kd-tree
transform_metric:
  - (deg2rad, haversine)
  - (to sphere, euclidian)

(Sorry, hit send to early.)

I don't think we can use k-d tree with lat/lon haversine: https://news.ycombinator.com/item?id=9281998

The rest of the matrix is valid. Ball tree works with any metric that respects the triangle inequality.

Actually, there is already detailed comparison + benchmarks in Jake's blog post. But it'd still be interesting to see it applied to NEMO/FESOM spatial data patterns and the metrics/conversions used here.

Mmm now I'm confused about the validity of kd-tree + haversine. I'm wondering why haversine is not in the list of valid metrics for scikit-learn's KDTree.

We should might also consider the implementation in pykdtree.