monocongo/climate_indices

option to preserve chunking?

Closed this issue · 1 comments

Is your feature request related to a problem? Please describe.
I am using process_climate_indices in a data pipeline, with this step being done on a fairly large, dedicated compute resource. The steps beforehand take advantage of some carefully crafted netcdf4 chunksizes, and this indirectly helps process_climate_indices with its multiprocessing.

There are some steps afterwards that operate on the netcdfs generated by process_climate_indices, but they aren't running as efficiently as I'd like because the outputs get saved out contiguously. Since compute_climate_indices ends up with the climate index entirely in memory, it can write it out in a chunked or contiguous manner with no real difference in performance.

Describe the solution you'd like
I'm thinking it would be a useful feature to utilize the chunksizes of the input files (if any) and apply them for the output files. Correct me if I'm wrong, but the input dimensions are always the same as the output dimensions, so we should be able to have a --chunksizes inputs option that would mean "use the input file's chunksizes in the output files". This could also lead to future abilities to explicitly specify chunksizes.

If this would be welcomed, I think I can put together a PR that does this in fairly short order.

This sounds like a good idea to me, please have at it! Thanks in advance for your contributions.