Support re-projecting of coordinates
Closed this issue · 4 comments
I believe that we should be able to re-project coordinates and adapt crs in the index adequately. It would mean depending on pyproj on top of shapely and xarray which I assume is fine anyway. We should probably even store CRS ShapelySTRTreeIndex.crs
as pyproj.CRS
object anyway. Right now, we sort of assume that as our examples use geopandas.GeometryArray
that comes with pyproj.CRS
but if you pass a numpy.array of shapely geoms, it can be anything. I'd follow geopadnas example here and use pyproj to handle that and enable to_crs()
.
I am less sure about the actual API. Maybe via a custom .vec
accessor? As DataArray.vec.coords_to_crs(coordinates, target_crs,, **coords_kwargs)
wrapping DataArray.assign_coords
?
I wonder if it would be worth collaborating with odc-geo
for this? The scope differs a bit (raster + crs vs geometries + crs), but it should be close enough? The only caveat is that that builds on pre-2.0 shapely, so as far as I can tell single geometries only, at the moment.
It would mean depending on pyproj on top of shapely and xarray which I assume is fine anyway.
Yes, even depend on GeoPandas would be fine, since I guess users will often do conversion between vector data cubes and geo dataframes.
Maybe via a custom
.vec
accessor?
Yes a custom accessor would certainly be a nice addition here. To go even further we could probably replicate most of the geopandas.GeoSeries
and geopandas.GeoDataFrame
API in DataArray
and Dataset
accessors respectively. For consistency we might want to provide an API here that is close to GeoPandas, when possible.
Which name should we choose for the accessor? Is vec
safe enough regarding possible conflicts with coordinate or data variable names?
We should probably even store CRS ShapelySTRTreeIndex.crs as pyproj.CRS object anyway. Right now, we sort of assume that as our examples use geopandas.GeometryArray that comes with pyproj.CRS.
I guess it would be OK to support different ways of getting the CRS information from the geometry coordinate. I would suggest the following (in order of precedence):
- From the index if the coordinate is baked by a
xvec
custom index - From the wrapped
geopandas.GeometryArray
object (once wrapping such array as a dimension coordinate is supported by Xarray) - From the “crs” coordinate attribute (if any): anything serializable that is supported by
pyproj.CRS.from_user_input()
- Any default CRS? Or an undefined CRS?
@keewis odc-geo
looks interesting indeed. There are notable differences between raster and vector data cubes regarding their dimensions, coordinates and indexes, though. Do you have some examples in mind on how odc-geo
and xvec
could work together?
Not really, I just thought that since xvec
is trying to do shapely
+ crs and odc-geo
is doing something similar (just single geometries at the moment, though) it might be a good idea to make sure both libraries are aware of each other, even if in the end that's all that happens.
One example I could imagine, though, is to use odc-geo
to extract a GeoBox
(extent, resolution, and crs) from a Dataset
, convert that to a list of cell geometries and use that to do the conservative regridding from the pangeo discourse (although that might be restricted to regular rectilinear grids at the moment)