eachchunk fall-back for transposed or offset arrays?
Alexander-Barth opened this issue · 1 comments
This is low priority issue for me, but when DiskArrays are used for memory arrays, the it might be useful to adapt the default chunksize for transposed arrays:
julia> DiskArrays.default_chunk_size[] = 1
1
julia> eachchunk(ones(4000,4000))
1×130 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
(1:4000, 1:31) (1:4000, 32:62) (1:4000, 63:93) … (1:4000, 4000:4000)
julia> eachchunk(ones(4000,4000)')
1×130 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
(1:4000, 1:31) (1:4000, 32:62) (1:4000, 63:93) … (1:4000, 4000:4000)
For a transposed array I would expect that the first chunk size of (1:31, 1:4000)
. I guess one can use stride
to check this.
I am note sure if OffsetArray
is supposed to be sported but currently all indices from eachchunk
are ones-based
using DiskArrays, OffsetArray
DiskArrays.default_chunk_size[] = 1
OA = OffsetArray(ones(4000,4000),(-1,-1));
c = eachchunk(OA);
OA[last(c)...]
# ERROR: BoundsError: attempt to access 4000×4000 OffsetArray(::Matrix{Float64}, 0:3999, 0:3999) with eltype Float64 with indices 0:3999×0:3999 at index [1:4000, 4000:4000]
None of these issue affect me currently in any production code, but I thought it might be useful to notify you about it.
I am using DiskArrays v0.3.22
Thanks, yes we should test on OffsetArray
. Correctness fixes are probably more important at this stage, the cost of wrong stride in Array will be lost in the cost of reading from disk (assuming this mostly happens in a broadcast/zip with a real disk-based array). But yes one day we could correct for that.