Blosc/c-blosc2

Why no chunk skipping in sparse frame?

dmikushin opened this issue · 1 comments

I find the idea of sparse frames clear and very useful. But then my intuition tells me: since it's a sparse frame, there must be a way to skip chunks during writing, e.g. for out-of-order writing of chunks produced by distributed computing. I go to find it in the API, and find nothing regarding this feature. Is it my intuition wrong or atypical? What other meaning the sparse term could possible have, then? Or is it a TODO feature?

Yes, you can do that with the current API. If you want to use the basic SChunk API, you can build a large container having default values with blosc2_schunk_fill_special(). This does not consume storage resources, but rather uses a special storage mechanism, so it can be as large as you want. Then, you can use blosc2_schunk_set_slice_buffer() for setting actual values. For reading, you can use the blosc2_schunk_get_slice_buffer() counterpart.

Alternatively, you may want to use the higher level Blosc2 NDim API; for example, use b2nd_empty() to create a big (empty) array, and then b2nd_set_slice_cbuffer() and b2nd_get_slice_cbuffer() for writing and reading respectively. These are the equivalents of the SChunk calls above, but for NDim containers, so internally they work the same.

Finally, you don't necessarily need to use the sparse format for using the APIs above, but it is true that for writing concurrently, the sparse format is more convenient. Mind that you will have to provide all the locking mechanism (if needed) by yourself, as Blosc2 does not provide protections for concurrent processes writing the same file.

Hope this helps.