ecmwf/polytope

Can this be used to optimize the extraction of grid point timeseries from grib files?

guidocioni opened this issue · 5 comments

What maintenance does this project need?

I just discovered this project today and it seems to a address a problem that I had for a really long time.

For example, when reading a grib file from ecmwf ensemble with xarray

Screenshot 2023-06-26 at 15 35 49

attempting to extract a grid point with

.sel(lon=45, lat=10, method='nearest')

causes excessive memory/CPU usage and eventually takes too long.

My solution for the moment was to transfer this burden to cdo

filename = cdo.remapnn('lon=45/lat=10', input='-merge /ecmwf-ens/*.grib2', options='-P 12')
d = xr.open_dataset(filename, engine='cfgrib').squeeze().copy()

but this is far away from optimal.

Could this be solved using polytope to subset the grib file readed with xarray? As far as I know grib files do not support random access.
Do you have any example on how to extract a point in lat/lon? The documentation only shows how to extract boxes. I would like to test a little bit and see if there's any advantage over the "hybrid" CDO method :)

Organisation

No response

Hi @guidocioni, at the moment Polytope won't be able to help with the resources problem, because internally it uses the sel method and needs to load the xarray too. In the future we will support directly reading from GRIB with random access; this is currently work in progress.

To extract a single point you need to use the Select shape on each axis.

Request(
    Select("lat", 10.0),
    Select("lon", 45.0)
)

This is effectively a 0-polytope, so you are asking the algorithm to find all points within, or on the edge of, an infinitely small shape. This does work, but its not equivalent to method="nearest". If the point isn't within that shape it won't be returned, and there is often floating point imprecision which means the coordinate in the datacube isn't quite the same as the coordinate you request.

You may need to revert to requesting a small box around the expected point in this situation.

Hi @guidocioni, at the moment Polytope won't be able to help with the resources problem, because internally it uses the sel method and needs to load the xarray too. In the future we will support directly reading from GRIB with random access; this is currently work in progress.

To extract a single point you need to use the Select shape on each axis.

Request(
    Select("lat", 10.0),
    Select("lon", 45.0)
)

This is effectively a 0-polytope, so you are asking the algorithm to find all points within, or on the edge of, an infinitely small shape. This does work, but its not equivalent to method="nearest". If the point isn't within that shape it won't be returned, and there is often floating point imprecision which means the coordinate in the datacube isn't quite the same as the coordinate you request.

You may need to revert to requesting a small box around the expected point in this situation.

Thanks, that clarifies a little bit.

Would that change in case I'm Reading netcdf instead than grib files?

Also, if polytope uses sel under the hood what is the advantage of using this instead than simply xarray? or is it meant to ease writing down conditions to select complex areas (like rings or weird shapes)?

It'll be the same for netcdf, it still needs to be converted to xarray for use in Polytope.

Generally for points or orthogonal boxes, Polytope isn't going to help. It's more for complex shapes. Though once we have proper GRIB support (probably via the FDB https://github.com/ecmwf/fdb), it will provide performant direct-to-point random access.

It'll be the same for netcdf, it still needs to be converted to xarray for use in Polytope.

Generally for points or orthogonal boxes, Polytope isn't going to help. It's more for complex shapes. Though once we have proper GRIB support (probably via the FDB https://github.com/ecmwf/fdb), it will provide performant direct-to-point random access.

Ok sounds good!
I'll keep an eye out then.
thanks

I should also note that the GRIB/FDB support will also bring support for semi-regular non-cartesian grids (such as Gaussian grids), which is another current limitation of the xarray approach. Polytope will be able to help with extraction of points/boxes from these types of grid.

Cheers