block sizing
freeman-lab opened this issue · 0 comments
After extensive benchmarking with @jwittenbach , we realized a few issues with how block/chunk sizes are currently computed, and in particular found that the default value of 150
is too large. We came up with a plan that should significantly improve performance.
For the following functions:
images.toblocks()
images.toseries()
images.map_as_series()
The argument currently known as block_size
or size
should be consistently named chunk_size
. If given a string, it will be treated as kb
not mb
, so should be documented as such. The default values should be auto
where auto
tries to compute a chunk_size
that will result in ~100mb
blocks, where a block includes all dimensions, whereas a chunk is defined only in image space.
For example, for (9000,512,512)
data, the empirically optimal chunk_size
is around 10kb
, which results in blocks of size (9000,9,512)
which is around 100mb
.
For the time being, manually specifying sizes much lower than the current default could significantly improve performance for these operations.
cc @d-v-b