bcdev/nc2zarr

Do not append variables missing append_dim

Closed this issue · 2 comments

nc2zarr fails when appending variables that do not have the dimension indicated by append_dim, e.g. "time".
Variables that lack the append_dim dimension should be written once and from then on be excluded from appending.

Here is an example from the SST L4 GHRSST GMP source products:

2021-06-03 11:39:54,121: INFO: nc2zarr: 365 input(s) found:
  0: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/01/19820101120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
  1: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/02/19820102120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0--GLOB_CDR2.0-v02.0-fv01.0.nc
...
  364: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/12/31/19821231120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
2021-06-03 11:39:54,122: INFO: nc2zarr: Processing input 1 of 365: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/01/19820101120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
2021-06-03 11:39:54,300: INFO: nc2zarr: Opening done: took 0.18 seconds
2021-06-03 11:39:56,486: INFO: nc2zarr: Writing dataset done: took 2.16 seconds
2021-06-03 11:39:56,497: INFO: nc2zarr: Processing input 2 of 365: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/02/19820102120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
2021-06-03 11:39:56,607: INFO: nc2zarr: Opening done: took 0.11 seconds
2021-06-03 11:39:56,650: ERROR: nc2zarr: Appending dataset failed: took 0.02 seconds
2021-06-03 11:39:56,650: ERROR: nc2zarr: Converting failed: took 3.23 seconds
Traceback (most recent call last):
  File "/apps/slurm/spool/slurmd/job56309097/slurm_script", line 33, in <module>
    sys.exit(load_entry_point('nc2zarr', 'console_scripts', 'nc2zarr')())
...
  File "/home/users/forman/Projects/nc2zarr/nc2zarr/writer.py", line 82, in write_dataset
    retry.api.retry_call(self._write_dataset,
  File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/home/users/forman/Projects/nc2zarr/nc2zarr/writer.py", line 98, in _write_dataset
    self._append_dataset(ds)
  File "/home/users/forman/Projects/nc2zarr/nc2zarr/writer.py", line 146, in _append_dataset
    ds.to_zarr(self._output_store,
  File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/core/dataset.py", line 1790, in to_zarr
    return to_zarr(
  File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/backends/api.py", line 1452, in to_zarr
    _validate_datatypes_for_zarr_append(dataset)
  File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/backends/api.py", line 1300, in _validate_datatypes_for_zarr_append
    check_dtype(k)
  File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/backends/api.py", line 1291, in check_dtype
    raise ValueError(
ValueError: Invalid dtype for data variable: <xarray.DataArray 'field_name' (fields: 16, field_name_length: 50)>
dask.array<array, shape=(16, 50), dtype=|S1, chunksize=(16, 50), chunktype=numpy.ndarray>
Coordinates:
  * fields             (fields) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * field_name_length  (field_name_length) int32 1 2 3 4 5 6 ... 46 47 48 49 50 dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object

The actual error here is (I think) caused by an xarray bug which I reported here: pydata/xarray#5224 . xarray should allow appending of this variable. But of course it's a nc2zarr bug as well, since even if xarray could append those |S1-typed variables correctly, it doesn't make sense to do so.

The actual error here is (I think) caused by an xarray bug which I reported here: pydata/xarray#5224 .

Right!