Extension to the HDF5 chunks API
Closed this issue · 0 comments
davidhassell commented
Currently (v1.11.1.0
), the treatment of HDF5 chunking is a bit inadequate:
- Chunking can only be set on a per-Data object basis
- Chunking can only be defined by explicitly setting the chunks shape on each axis
- Chunking is ignored in an output file unless native compression is on
- Chunks from an input file are not stored
A more comprehensive and flexible API is needed:
cfdm.write
should chunk by default, and have a keywork argument (hdf5_chunks
) to configure the default chunking.cfdm.read
should, by default, store HDF5 chunking on the returned data, so that it will be used when when writing out to a new netCDF4 file.- Setting a HDF5 chunking strategy should be more intuitive. E.g. it should be easy to "chunk the time axis by 12 elements, leaving all other axes unchunked":
f.nc_set_hdf_chunksizes({'T': 12})
- Setting HDF5 chunksizes follows the Dask API for defining its computaitonal chunk sizes. E.g.
f.nc_set_hdf_chunksizes("8 MiB")
PR to follow.