ilastik/lazyflow

blocked cache operator memory usage scales with overall dataset size

stuarteberg opened this issue · 0 comments

OpBlockedArrayCache initializes bookeeping datastructures that scale with the size of the input slot shape. See for example, the _dirtyShape member (1 byte per block) and the _flatBlockIndices member (8 bytes per block). This means that the cache has a non-negligible memory footprint even if the cache output is never used.

One consequence of this is that headless scripts intended to be used with large datasets (e.g. 1 TB) must be careful not to allow their workflow to even instantiate a cache, much less use it. It would be nice if this weren't the case.

Ideally, there would be no significant memory penalty to instantiating a cache, as long as it isn't actually used. Then we could use the same operators in headless workflows as we do in gui workflows. The only difference would be that the gui would request data the output slots that are connected to the caches, whereas headless workflows avoid those output slots.