NCAS-CMS/cfdm

Extension to the HDF5 chunks API

Closed this issue · 0 comments

Currently (v1.11.1.0), the treatment of HDF5 chunking is a bit inadequate:

  • Chunking can only be set on a per-Data object basis
  • Chunking can only be defined by explicitly setting the chunks shape on each axis
  • Chunking is ignored in an output file unless native compression is on
  • Chunks from an input file are not stored

A more comprehensive and flexible API is needed:

  • cfdm.write should chunk by default, and have a keywork argument (hdf5_chunks) to configure the default chunking.
  • cfdm.read should, by default, store HDF5 chunking on the returned data, so that it will be used when when writing out to a new netCDF4 file.
  • Setting a HDF5 chunking strategy should be more intuitive. E.g. it should be easy to "chunk the time axis by 12 elements, leaving all other axes unchunked": f.nc_set_hdf_chunksizes({'T': 12})
  • Setting HDF5 chunksizes follows the Dask API for defining its computaitonal chunk sizes. E.g. f.nc_set_hdf_chunksizes("8 MiB")

PR to follow.