Do not append variables missing append_dim
Closed this issue · 2 comments
forman commented
nc2zarr fails when appending variables that do not have the dimension indicated by append_dim
, e.g. "time"
.
Variables that lack the append_dim
dimension should be written once and from then on be excluded from appending.
Here is an example from the SST L4 GHRSST GMP source products:
2021-06-03 11:39:54,121: INFO: nc2zarr: 365 input(s) found:
0: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/01/19820101120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
1: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/02/19820102120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0--GLOB_CDR2.0-v02.0-fv01.0.nc
...
364: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/12/31/19821231120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
2021-06-03 11:39:54,122: INFO: nc2zarr: Processing input 1 of 365: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/01/19820101120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
2021-06-03 11:39:54,300: INFO: nc2zarr: Opening done: took 0.18 seconds
2021-06-03 11:39:56,486: INFO: nc2zarr: Writing dataset done: took 2.16 seconds
2021-06-03 11:39:56,497: INFO: nc2zarr: Processing input 2 of 365: /neodc/esacci/sst/data/gmpe/CDR_V2/L4/v2.0/1982/01/02/19820102120000-ESACCI-L4_GHRSST-SST-GMPE-GLOB_CDR2.0-v02.0-fv01.0.nc
2021-06-03 11:39:56,607: INFO: nc2zarr: Opening done: took 0.11 seconds
2021-06-03 11:39:56,650: ERROR: nc2zarr: Appending dataset failed: took 0.02 seconds
2021-06-03 11:39:56,650: ERROR: nc2zarr: Converting failed: took 3.23 seconds
Traceback (most recent call last):
File "/apps/slurm/spool/slurmd/job56309097/slurm_script", line 33, in <module>
sys.exit(load_entry_point('nc2zarr', 'console_scripts', 'nc2zarr')())
...
File "/home/users/forman/Projects/nc2zarr/nc2zarr/writer.py", line 82, in write_dataset
retry.api.retry_call(self._write_dataset,
File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/retry/api.py", line 101, in retry_call
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/home/users/forman/Projects/nc2zarr/nc2zarr/writer.py", line 98, in _write_dataset
self._append_dataset(ds)
File "/home/users/forman/Projects/nc2zarr/nc2zarr/writer.py", line 146, in _append_dataset
ds.to_zarr(self._output_store,
File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/core/dataset.py", line 1790, in to_zarr
return to_zarr(
File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/backends/api.py", line 1452, in to_zarr
_validate_datatypes_for_zarr_append(dataset)
File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/backends/api.py", line 1300, in _validate_datatypes_for_zarr_append
check_dtype(k)
File "/home/users/forman/miniconda3/envs/nc2zarr/lib/python3.9/site-packages/xarray/backends/api.py", line 1291, in check_dtype
raise ValueError(
ValueError: Invalid dtype for data variable: <xarray.DataArray 'field_name' (fields: 16, field_name_length: 50)>
dask.array<array, shape=(16, 50), dtype=|S1, chunksize=(16, 50), chunktype=numpy.ndarray>
Coordinates:
* fields (fields) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
* field_name_length (field_name_length) int32 1 2 3 4 5 6 ... 46 47 48 49 50 dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object
pont-us commented
The actual error here is (I think) caused by an xarray bug which I reported here: pydata/xarray#5224 . xarray should allow appending of this variable. But of course it's a nc2zarr bug as well, since even if xarray could append those |S1
-typed variables correctly, it doesn't make sense to do so.
forman commented
The actual error here is (I think) caused by an xarray bug which I reported here: pydata/xarray#5224 .
Right!