block sizing

Question

block sizing

freeman-lab opened this issue 9 years ago · 0 comments

After extensive benchmarking with @jwittenbach , we realized a few issues with how block/chunk sizes are currently computed, and in particular found that the default value of 150 is too large. We came up with a plan that should significantly improve performance.

For the following functions:

images.toblocks()
images.toseries()
images.map_as_series()

The argument currently known as block_size or size should be consistently named chunk_size. If given a string, it will be treated as kb not mb, so should be documented as such. The default values should be auto where auto tries to compute a chunk_size that will result in ~100mb blocks, where a block includes all dimensions, whereas a chunk is defined only in image space.

For example, for (9000,512,512) data, the empirically optimal chunk_size is around 10kb, which results in blocks of size (9000,9,512) which is around 100mb.

For the time being, manually specifying sizes much lower than the current default could significantly improve performance for these operations.

cc @d-v-b