Investigate performance on long timeseries

Question

Investigate performance on long timeseries

BSchilperoort opened this issue a year ago · 2 comments

Your work in reimplementing regridding methods is interesting! I often find ESMF to be a bit heavy on dependencies side indeed. But one crucial feature that makes it better than xr.interp is that it computes weights first and then applies them in parallel for all spatial slices. This is a major dealbreaker when dealing with long timeseries! Do you have plans on implementing something like that in pure-xarray ?

Originally posted by @aulemahal in pangeo-data/xESMF#282 (comment)

Answer 1 · 2023-10-09T13:15:44.000Z

It should be easy to compute weights and apply them, the same way that the conservative method currently does. This way we could possibly improve performance for long timeseries.

Answer 2 · 2024-09-26T14:17:47.000Z

I don't think this is actually an issue due to the way interpolation is applied independently across the dimensions. In some quick benchmarks on a longer timeseries, regrid.linear is anywhere from 1.2x to 4x faster than regrid.conservative depending on the chunk scheme. So I doubt we would get any improvement by instead generating our own weights and doing .dot().

Plus its pretty cool that xarray_regrid/methods/interp.py is only 52 lines, most of which are type overloads 😄