xarray-contrib/cf-xarray

selecting coordinates by axis with ancillary variables that have different dimensions

keewis opened this issue · 1 comments

For a dataset similar to this:

In [2]: ds = xr.Dataset(
   ...:     {
   ...:         "x": ("x", range(10), {"axis": "X", "ancillary_variables": "position_flag"}),
   ...:         "y": ("y", range(10), {"axis": "Y", "ancillary_variables": "position_flag"}),
   ...:         "position_flag": ("position", range(10)),
   ...:     }
   ...: )
   ...: ds
Out[2]: 
<xarray.Dataset>
Dimensions:        (x: 10, y: 10, position: 10)
Coordinates:
  * x              (x) int64 0 1 2 3 4 5 6 7 8 9
  * y              (y) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: position
Data variables:
    position_flag  (position) int64 0 1 2 3 4 5 6 7 8 9

getting a single coordinate by axis fails:

In [3]: ds.cf["X"]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 ds.cf["X"]

File .../lib/python3.10/site-packages/cf_xarray/accessor.py:2142, in CFDatasetAccessor.__getitem__(self, key)
   2110 def __getitem__(self, key: Hashable | Iterable[Hashable]) -> DataArray | Dataset:
   2111     """
   2112     Index into a Dataset making use of CF attributes.
   2113 
   (...)
   2140     Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
   2141     """
-> 2142     return _getitem(self, key)

File .../lib/python3.10/site-packages/cf_xarray/accessor.py:846, in _getitem(accessor, key, skip)
    844         coords.remove(allnames[0])
    845     for k1 in coords:
--> 846         da.coords[k1] = ds.variables[k1]
    847     return da
    848 else:

File .../lib/python3.10/site-packages/xarray/core/coordinates.py:41, in Coordinates.__setitem__(self, key, value)
     40 def __setitem__(self, key: Hashable, value: Any) -> None:
---> 41     self.update({key: value})

File .../lib/python3.10/site-packages/xarray/core/coordinates.py:172, in Coordinates.update(self, other)
    168 self._maybe_drop_multiindex_coords(set(other_vars))
    169 coords, indexes = merge_coords(
    170     [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
    171 )
--> 172 self._update_coords(coords, indexes)

File .../lib/python3.10/site-packages/xarray/core/coordinates.py:390, in DataArrayCoordinates._update_coords(self, coords, indexes)
    388 dims = calculate_dimensions(coords_plus_data)
    389 if not set(dims) <= set(self.dims):
--> 390     raise ValueError(
    391         "cannot add coordinates with new dimensions to a DataArray"
    392     )
    393 self._data._coords = coords
    395 # TODO(shoyer): once ._indexes is always populated by a dict, modify
    396 # it to update inplace instead.

ValueError: cannot add coordinates with new dimensions to a DataArray

which makes sense, because position_flag has a different dimension than x (using ds.cf[["X"]] works as expected because that ignores the ancillary variables).

Am I doing something wrong, is the dataset just not complying to the CF conventions (I don't know enough to be able to tell), or should we change the way ds.cf[axis] and ds.cf[[axis]] work (for example, make ds.cf[axis] return a Dataset and ds.cf[[axis]] a DataArray, or the other way around)?

s the dataset just not complying to the CF conventions

It's the other way, this is where Xarray's data model is too simple to keep up, parrticularly the restriction that we cannot add arbitrary new dimensions to a DataArray. I proposed relaxing this for Indexes, but not in general.

s should we change the way ds.cf[axis] and ds.cf[[axis]] work (for example, make ds.cf[axis] return a Dataset and ds.cf[[axis]] a DataArray, or the other way around)?

No. An old version of cf-xarray did this and I removed it quite quickly. We really want type-stability on the return value, and we want to match Xarray's behaviour.
In any case, If we returned a Dataset with x and position, how would you know which was X? You could use cf.axes["X"][0] I guess, but that's too wordy.

>           for k1 in coords:
--> 846         da.coords[k1] = ds.variables[k1]

I think we should check dimensions here before assignment, and silently drop positions. It'd be nice to add a logger integration so that we can record such things.