xarray-contrib/xvec

Support re-projecting of coordinates

Closed this issue · 4 comments

I believe that we should be able to re-project coordinates and adapt crs in the index adequately. It would mean depending on pyproj on top of shapely and xarray which I assume is fine anyway. We should probably even store CRS ShapelySTRTreeIndex.crs as pyproj.CRS object anyway. Right now, we sort of assume that as our examples use geopandas.GeometryArray that comes with pyproj.CRS but if you pass a numpy.array of shapely geoms, it can be anything. I'd follow geopadnas example here and use pyproj to handle that and enable to_crs().

I am less sure about the actual API. Maybe via a custom .vec accessor? As DataArray.vec.coords_to_crs(coordinates, target_crs,, **coords_kwargs) wrapping DataArray.assign_coords?

I wonder if it would be worth collaborating with odc-geo for this? The scope differs a bit (raster + crs vs geometries + crs), but it should be close enough? The only caveat is that that builds on pre-2.0 shapely, so as far as I can tell single geometries only, at the moment.

It would mean depending on pyproj on top of shapely and xarray which I assume is fine anyway.

Yes, even depend on GeoPandas would be fine, since I guess users will often do conversion between vector data cubes and geo dataframes.

Maybe via a custom .vec accessor?

Yes a custom accessor would certainly be a nice addition here. To go even further we could probably replicate most of the geopandas.GeoSeries and geopandas.GeoDataFrame API in DataArray and Dataset accessors respectively. For consistency we might want to provide an API here that is close to GeoPandas, when possible.

Which name should we choose for the accessor? Is vec safe enough regarding possible conflicts with coordinate or data variable names?

We should probably even store CRS ShapelySTRTreeIndex.crs as pyproj.CRS object anyway. Right now, we sort of assume that as our examples use geopandas.GeometryArray that comes with pyproj.CRS.

I guess it would be OK to support different ways of getting the CRS information from the geometry coordinate. I would suggest the following (in order of precedence):

  • From the index if the coordinate is baked by a xvec custom index
  • From the wrapped geopandas.GeometryArray object (once wrapping such array as a dimension coordinate is supported by Xarray)
  • From the “crs” coordinate attribute (if any): anything serializable that is supported by pyproj.CRS.from_user_input()
  • Any default CRS? Or an undefined CRS?

@keewis odc-geo looks interesting indeed. There are notable differences between raster and vector data cubes regarding their dimensions, coordinates and indexes, though. Do you have some examples in mind on how odc-geo and xvec could work together?

Not really, I just thought that since xvec is trying to do shapely + crs and odc-geo is doing something similar (just single geometries at the moment, though) it might be a good idea to make sure both libraries are aware of each other, even if in the end that's all that happens.

One example I could imagine, though, is to use odc-geo to extract a GeoBox (extent, resolution, and crs) from a Dataset, convert that to a list of cell geometries and use that to do the conservative regridding from the pangeo discourse (although that might be restricted to regular rectilinear grids at the moment)