`fill_cdpna()` causes crash in output
aadm opened this issue · 2 comments
With an input data read from netcdf via dask, applying fill_cdpna()
causes an error when writing back the file to netcdf.
One solution is to 'realize' the data and then outputting to disk, e.g.:
cube.compute().seisio.to_netcdf('cube_crop.seisnc')
But this is not applicable if the data size is very large (it would crash simply because there is not enough ram to hold the data in memory).
Example:
cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube.seis.fill_cdpna()
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')
Resulting error log copied from jupyter console:
In [7]: cube.seisio.to_netcdf('cube_crop.seisnc')
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-7-0f071262ca7b> in <module>
----> 1 near.seisio.to_netcdf(local_data+'md4_near_cond_crop.seisnc')
~/miniconda3/envs/wrk/lib/python3.7/site-packages/segysak/_accessor.py in to_netcdf(self, seisnc, **kwargs)
46 kwargs["engine"] = "h5netcdf"
47
---> 48 self._obj.to_netcdf(seisnc, **kwargs)
49
50
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1566 unlimited_dims=unlimited_dims,
1567 compute=compute,
-> 1568 invalid_netcdf=invalid_netcdf,
1569 )
1570
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1080 # to be parallelized with dask
1081 dump_to_store(
-> 1082 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1083 )
1084 if autoclose:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1126 variables, attrs = encoder(variables, attrs)
1127
-> 1128 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
1129
1130
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
296 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
297 self.set_variables(
--> 298 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
299 )
300
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
334 check = vn in check_encoding_set
335 target, source = self.prepare_variable(
--> 336 name, v, check, unlimited_dims=unlimited_dims
337 )
338
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
287 dimensions=variable.dims,
288 fillvalue=fillvalue,
--> 289 **kwargs,
290 )
291 else:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in create_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
499 group = group._require_child_group(k)
500 return group._create_child_variable(keys[-1], dimensions, dtype, data,
--> 501 fillvalue, **kwargs)
502
503 def _get_child(self, key):
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _create_child_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
474 h5ds = self._h5group[name]
475 if _netcdf_dimension_but_not_variable(h5ds):
--> 476 self._detach_dim_scale(name)
477 del self._h5group[name]
478
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _detach_dim_scale(self, name)
572 for n, dim in enumerate(var.dimensions):
573 if dim == name:
--> 574 var._h5ds.dims[n].detach_scale(self._all_h5groups[dim])
575
576 for subgroup in self.groups.values():
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5py/_hl/dims.py in detach_scale(self, dset)
101 """
102 with phil:
--> 103 h5ds.detach_scale(self._id, dset.id, self._dimension)
104
105 def items(self):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5ds.pyx in h5py.h5ds.detach_scale()
h5py/defs.pyx in h5py.defs.H5DSdetach_scale()
RuntimeError: Unspecified error in H5DSdetach_scale (return value <0)
Thanks @aadm I'm not sure if this can be fixed but I'll have a look. An alternative option for us that is probably simpler is to allow fill_cdpna()
to be run during the conversion process, and then it would be implicit in the NetCDF4 file. It is a little frustrating that it is not super easy to modify parts of NetCDF4
on disk.
One option I might get you to try. Is open the file without using dask ('this won't load it straight away'). Run fill_cdpna()
and then use the to_netcdf()
method with mode='a'
to append modified variables. Then try your cropping operation using dask
.
So your example would be:
cube = open_seisnc(local_data+'cube.seisnc')
cube.seis.fill_cdpna()
cube.seis.to_netcdf(local_data+'cube.seisnc', mode='a') # Note this will modify the file so you might want to back it up as a test.
cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')
This has been addressed in v0.5 with lazy loading support and reversion to xarray native netcdf methods.
Note: please use the new accessor namespace method .segysak.fill_cdpna()
.
Note: The current release candidate has a bug but the actual release fill fix a uint downcasting issue that isn't compatible with xr.to_netcdf
.