MITgcm/xmitgcm

open_mdsdataset dimension error

Opened this issue · 8 comments

Hello!
I'm having an issue loading 2D fields from an LLC270 run.
All 3D variables are loading as expected but the 2D fields are giving the error:
ValueError: dimensions ('time', 'j', 'i') must have the same length as the number of data dimensions, ndim=2

The .meta files one these 2D fields look equivalent to that of other runs I have not had issues loading variables from, e.g.,

 dimList = [
  1080,    1, 1080,
   310,    1,  310
 ];
 dataprec = [ 'float32' ];
 nrecords = [         53 ];
 timeStepNumber = [       8640 ];
 timeInterval = [  7.905600000000E+06  1.036800000000E+07 ];
 missingValue = [ -9.99000000000000E+02 ];
 nFlds = [   53 ];
 fldList = {
 'ETAN    ' 'SIarea  ' 'SIheff  ' 'SIhsnow ' 'SItices ' 'SIhsalt ' 'SIuice  ' 'SIvice  ' 'SHIfwFlx' 'SHIhtFlx' 'SHI_TauX' 'SHI_TauY' 'DETADT2 ' 'PHIBOT  ' 'sIceLoad' 'MXLDEPTH' 'oceSPDep' 'SIatmQnt' 'SIatmFW ' 'oceQnet '
 'oceFWflx' 'oceTAUX ' 'oceTAUY ' 'oceSflux' 'TFLUX   ' 'SFLUX   ' 'EXFtaux ' 'EXFtauy ' 'EXFlwnet' 'EXFswnet' 'EXFswdn ' 'EXFlwdn ' 'EXFqnet ' 'EXFhs   ' 'EXFhl   ' 'EXFevap ' 'EXFpreci' 'EXFatemp' 'SIqnet  ' 'SIqsw   '
 'SIatmQnt' 'SItflux ' 'SIaaflux' 'SIhl    ' 'SIqneto ' 'SIqneti ' 'SIempmr ' 'SIatmFW ' 'SIsnPrcp' 'SIactLHF' 'SIacSubl' 'botTauX ' 'botTauY '
 };
state_2d_set1.0000008640.meta (END)

so I am at a bit of a loss as to the issue. I've checked with the person who generated the data there should be only one timestamp (monthly) per .data file. Could someone help me understand where the dimensions=('time','j','i') information is sourced from and whether there is a workaround that can prevent this clash?

Can you share the code you are using to open the data?

Sure, it's come up with a few iterations on the basic open_mdsdataset call including just the basic
state_2d = open_mdsdataset(rootdir+'state_2d_set1/')
and including time info, for example,
state_2d = open_mdsdataset(rootdir+'state_2d_set1/',delta_t = 1200, ref_date='1991-12-15 0:0:0)

Hi all, I'm having the same issue. If someone has the solution, I'd appreciate hearing it! :)

Hi @ruth-moorman, does it work if you add the arguments geometry="llc", nx=270?

Hiya @timothyas sorry for the weird delay here, I ended up not working with that output but am now having the same issue with different output from an LLC540 configuration. Again, the issue is only occurring with 2D variables. In this case I know I should be using geometry = 'curvilinear' and am (and, again, works for 3d variables).

So for example I'm calling:

ds = xmitgcm.open_mdsdataset('../llc540_notides_cycle2/results/diags/', grid_dir = '../llc540_notides_cycle2/results/',prefix = ['state_2d_set1'], geometry='curvilinear',delta_t=480, ref_date = '1993-1-1 0:0:0',iters=iterations[0])

and getting

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 2
      1 # however in this notebook I'm mostly concerned with understanding the bathymetry, so I'll just infile and name an iter0 dataset (kind of a dummy,
----> 2 ds = xmitgcm.open_mdsdataset(llc540_dir, grid_dir = llc540_dir_grid,prefix = ['state_2d_set1'], geometry=geometry,delta_t=delta_t, ref_date = ref_date,iters=iterations[0])
      3 # ds = add_latlon(ds)
      4 # grid = xgcm.Grid(ds, periodic='X')
      5 # ds

File ~/miniforge3/envs/pangeo/lib/python3.11/site-packages/xmitgcm/mds_store.py:273, in open_mdsdataset(data_dir, grid_dir, iters, prefix, read_grid, delta_t, ref_date, calendar, levels, geometry, grid_vars_to_coords, swap_dims, endian, chunks, ignore_unknown_vars, default_dtype, nx, ny, nz, llc_method, extra_metadata, extra_variables)
    270                 ds = _set_coords(ds)
    271             return ds
--> 273 store = _MDSDataStore(data_dir, grid_dir, iternum, delta_t, read_grid,
    274                       prefix, ref_date, calendar,
    275                       geometry, endian,
    276                       ignore_unknown_vars=ignore_unknown_vars,
    277                       default_dtype=default_dtype,
    278                       nx=nx, ny=ny, nz=nz, llc_method=llc_method,
    279                       levels=levels, extra_metadata=extra_metadata,
    280                      extra_variables=extra_variables)
    282 ds = xr.Dataset.load_store(store)
    283 if swap_dims:

File ~/miniforge3/envs/pangeo/lib/python3.11/site-packages/xmitgcm/mds_store.py:596, in _MDSDataStore.__init__(self, data_dir, grid_dir, iternum, delta_t, read_grid, file_prefixes, ref_date, calendar, geometry, endian, ignore_unknown_vars, default_dtype, nx, ny, nz, llc_method, levels, extra_metadata, extra_variables)
    593 # Create masks from hFac variables
    594 data = self.calc_masks(vname, data)
--> 596 thisvar = xr.Variable(dims, data, attrs)
    597 self._variables[vname] = thisvar

File ~/miniforge3/envs/pangeo/lib/python3.11/site-packages/xarray/core/variable.py:367, in Variable.__init__(self, dims, data, attrs, encoding, fastpath)
    347 """
    348 Parameters
    349 ----------
   (...)
    364     unrecognized encoding items.
    365 """
    366 self._data = as_compatible_data(data, fastpath=fastpath)
--> 367 self._dims = self._parse_dimensions(dims)
    368 self._attrs = None
    369 self._encoding = None

File ~/miniforge3/envs/pangeo/lib/python3.11/site-packages/xarray/core/variable.py:683, in Variable._parse_dimensions(self, dims)
    681     dims = tuple(dims)
    682 if len(dims) != self.ndim:
--> 683     raise ValueError(
    684         f"dimensions {dims} must have the same length as the "
    685         f"number of data dimensions, ndim={self.ndim}"
    686     )
    687 return dims

ValueError: dimensions ('time', 'j', 'i') must have the same length as the number of data dimensions, ndim=2

Hi @ruth-moorman, does it work to either not specify iters, or specify iters=[iterations[0]]? The type specification for the iters argument is a list, so this could be it. That's just a guess though...

@timothyas thanks for the suggestion but it doesn't look like it's the iters. iters=iterations[0], iters='all, no iters input, and iters=[iterations[0]] give the same error for the 2D fields. Just stressing in case it helps that I do not get this error with 3D fields for any of those listed values of iters.

i.e. this: xmitgcm.open_mdsdataset(llc540_dir, grid_dir = llc540_dir_grid,prefix = ['layers_3d_set2','fluxes_3d_set1','trsp_3d_set1','state_3d_set1'], geometry=geometry, delta_t=delta_t, ref_date = ref_date,iters=iterations[0])
works totally fine

Hi @ruth-moorman, too bad that wasn't the issue. I'm not really sure what's going on. I cannot reproduce the error using the curvilinear_leman dataset in xmitgcm's test suite. If there's any way you can make the data public, I'd be happy to help you out further. I'm also curious how/why you are using a curvilinear geometry with the llc540 geometry - is the entire model domain on just one of the llc faces?