selecting coordinates by axis with ancillary variables that have different dimensions
keewis opened this issue · 1 comments
For a dataset similar to this:
In [2]: ds = xr.Dataset(
...: {
...: "x": ("x", range(10), {"axis": "X", "ancillary_variables": "position_flag"}),
...: "y": ("y", range(10), {"axis": "Y", "ancillary_variables": "position_flag"}),
...: "position_flag": ("position", range(10)),
...: }
...: )
...: ds
Out[2]:
<xarray.Dataset>
Dimensions: (x: 10, y: 10, position: 10)
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9
* y (y) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: position
Data variables:
position_flag (position) int64 0 1 2 3 4 5 6 7 8 9
getting a single coordinate by axis fails:
In [3]: ds.cf["X"]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ds.cf["X"]
File .../lib/python3.10/site-packages/cf_xarray/accessor.py:2142, in CFDatasetAccessor.__getitem__(self, key)
2110 def __getitem__(self, key: Hashable | Iterable[Hashable]) -> DataArray | Dataset:
2111 """
2112 Index into a Dataset making use of CF attributes.
2113
(...)
2140 Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
2141 """
-> 2142 return _getitem(self, key)
File .../lib/python3.10/site-packages/cf_xarray/accessor.py:846, in _getitem(accessor, key, skip)
844 coords.remove(allnames[0])
845 for k1 in coords:
--> 846 da.coords[k1] = ds.variables[k1]
847 return da
848 else:
File .../lib/python3.10/site-packages/xarray/core/coordinates.py:41, in Coordinates.__setitem__(self, key, value)
40 def __setitem__(self, key: Hashable, value: Any) -> None:
---> 41 self.update({key: value})
File .../lib/python3.10/site-packages/xarray/core/coordinates.py:172, in Coordinates.update(self, other)
168 self._maybe_drop_multiindex_coords(set(other_vars))
169 coords, indexes = merge_coords(
170 [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
171 )
--> 172 self._update_coords(coords, indexes)
File .../lib/python3.10/site-packages/xarray/core/coordinates.py:390, in DataArrayCoordinates._update_coords(self, coords, indexes)
388 dims = calculate_dimensions(coords_plus_data)
389 if not set(dims) <= set(self.dims):
--> 390 raise ValueError(
391 "cannot add coordinates with new dimensions to a DataArray"
392 )
393 self._data._coords = coords
395 # TODO(shoyer): once ._indexes is always populated by a dict, modify
396 # it to update inplace instead.
ValueError: cannot add coordinates with new dimensions to a DataArray
which makes sense, because position_flag
has a different dimension than x
(using ds.cf[["X"]]
works as expected because that ignores the ancillary variables).
Am I doing something wrong, is the dataset just not complying to the CF conventions (I don't know enough to be able to tell), or should we change the way ds.cf[axis]
and ds.cf[[axis]]
work (for example, make ds.cf[axis]
return a Dataset
and ds.cf[[axis]]
a DataArray
, or the other way around)?
s the dataset just not complying to the CF conventions
It's the other way, this is where Xarray's data model is too simple to keep up, parrticularly the restriction that we cannot add arbitrary new dimensions to a DataArray. I proposed relaxing this for Indexes, but not in general.
s should we change the way ds.cf[axis] and ds.cf[[axis]] work (for example, make ds.cf[axis] return a Dataset and ds.cf[[axis]] a DataArray, or the other way around)?
No. An old version of cf-xarray did this and I removed it quite quickly. We really want type-stability on the return value, and we want to match Xarray's behaviour.
In any case, If we returned a Dataset
with x
and position
, how would you know which was X
? You could use cf.axes["X"][0]
I guess, but that's too wordy.
> for k1 in coords:
--> 846 da.coords[k1] = ds.variables[k1]
I think we should check dimensions here before assignment, and silently drop positions
. It'd be nice to add a logger
integration so that we can record such things.