JuliaIO/DiskArrays.jl

readblock! outputs values in wrong positions

Closed this issue · 11 comments

As of this version, the output of this looks wrong.
using the file from here

But, it also happens for other .tif and .nc chunked files.

This looks to happened only when the data is stored into multi chunks, then the reading of those gets corrupted. If is only one, then no issue, or if we use ds["DQF"].data[1:2000, 1:2000] is also fine.

DiskArrays 0.4.4

using YAXArrays
using NetCDF
using GLMakie

ds = open_dataset("noaa.nc")
heatmap(ds["DQF"])
Screenshot 2024-09-15 at 10 56 11

DiskArrays 0.3.23 Outputs the correct array

Screenshot 2024-09-15 at 10 46 43

see also, rafaqz/Rasters.jl#735

ooh interesting, thanks Lazaro! Good to know that it's a DiskArrays thing.

Curious that NetCDF.jl also has the same problem with the _Unsigned attribute. (The random high value areas are actually sign errors from UInt16 to Int16)...

The main issue is that heatmap seems to fall back to reading a DiskArray through the iterator interface. That means the data is looped through chunk by chunk while Makie is assuming it goes through normal julia iteration order, A small reproducer would be this one:

using DiskArrays.TestTypes
using GLMakie

data = [i+j for i in 1:200, j in 1:100]
da = AccessCountDiskArray(data, chunksize=(10,10))


heatmap(da)

while the following plot would be correct:

heatmap(da[:,:])

A solution to the problem at hand would be to read the data into memory before plotting heatmap(ds["DQF"][:,:]).

This is a more general issue with collect(diskarray) then? Good to know, but a lot of people will use it naively.

Should collect specifically return what one would expect on a diskarray, instead of looping chunk-wise?

No collect does the right thing. Makie must be actually iterating with iterate somewhere. Would be good to find that and switch it to a broadcast.

iterate and eachindex are both out of order (in chunk order). But if you use them together you get the right answers. Of course that can break in places.

Hmm, looks like heatmap's convert_args call el32convert which calls elconvert. Have a look at these lines:
https://github.com/MakieOrg/Makie.jl/blob/6dfb420445065c56f5af7944c87ecdfc22eff507/src/conversions.jl#L707C1-L718C4
elconvert is calling map if it finds an abstract array with missings, or convert(AbstractArray{NewType}, input) otherwise. Either of these might be failing, I'm trying to narrow down where that might be happening.

OK, I have a smaller MWE:

map(identity, parent(ds["DQF"]))

gives me a bad heatmap.

map(f, A::AbstractArray) = collect_similar(A, Generator(f,A)) is the implementation that Julia points me to, and I can confirm that calling Base.collect_similar(parent(ds["DQF"]), Base.Generator(identity, parent(ds["DQF"]))) also has the same striping problem.

This is a more general problem, because map on a DiskArray may not work like you'd think it would. Maybe Base.collect_similar needs to be adjusted here, so that it iterates in the correct order?

Shouldn't DiskGenerator fix that? Maybe it's not being used somehow. I thought we had fixed this 😅

According to @which at least, it's not being used...

Ah this is #144 again, I hadn't seen that issue

Oh right we didn't actually fix that 😭

I don't have access to YAXArrays on my current system, but I tested with Rasters/DD/DiskArrays latest, and this is indeed working. Probably fixed by #198.