readblock! outputs values in wrong positions
Closed this issue · 11 comments
As of this version, the output of this looks wrong.
using the file from here
But, it also happens for other .tif and .nc chunked files.
This looks to happened only when the data is stored into multi chunks, then the reading of those gets corrupted. If is only one, then no issue, or if we use ds["DQF"].data[1:2000, 1:2000]
is also fine.
DiskArrays 0.4.4
using YAXArrays
using NetCDF
using GLMakie
ds = open_dataset("noaa.nc")
heatmap(ds["DQF"])
DiskArrays 0.3.23 Outputs the correct array
see also, rafaqz/Rasters.jl#735
ooh interesting, thanks Lazaro! Good to know that it's a DiskArrays thing.
Curious that NetCDF.jl also has the same problem with the _Unsigned
attribute. (The random high value areas are actually sign errors from UInt16 to Int16)...
The main issue is that heatmap seems to fall back to reading a DiskArray through the iterator interface. That means the data is looped through chunk by chunk while Makie is assuming it goes through normal julia iteration order, A small reproducer would be this one:
using DiskArrays.TestTypes
using GLMakie
data = [i+j for i in 1:200, j in 1:100]
da = AccessCountDiskArray(data, chunksize=(10,10))
heatmap(da)
while the following plot would be correct:
heatmap(da[:,:])
A solution to the problem at hand would be to read the data into memory before plotting heatmap(ds["DQF"][:,:])
.
This is a more general issue with collect(diskarray)
then? Good to know, but a lot of people will use it naively.
Should collect specifically return what one would expect on a diskarray, instead of looping chunk-wise?
No collect
does the right thing. Makie must be actually iterating with iterate
somewhere. Would be good to find that and switch it to a broadcast.
iterate
and eachindex
are both out of order (in chunk order). But if you use them together you get the right answers. Of course that can break in places.
Hmm, looks like heatmap's convert_args call el32convert
which calls elconvert
. Have a look at these lines:
https://github.com/MakieOrg/Makie.jl/blob/6dfb420445065c56f5af7944c87ecdfc22eff507/src/conversions.jl#L707C1-L718C4
elconvert
is calling map
if it finds an abstract array with missings, or convert(AbstractArray{NewType}, input)
otherwise. Either of these might be failing, I'm trying to narrow down where that might be happening.
OK, I have a smaller MWE:
map(identity, parent(ds["DQF"]))
gives me a bad heatmap.
map(f, A::AbstractArray) = collect_similar(A, Generator(f,A))
is the implementation that Julia points me to, and I can confirm that calling Base.collect_similar(parent(ds["DQF"]), Base.Generator(identity, parent(ds["DQF"])))
also has the same striping problem.
This is a more general problem, because map
on a DiskArray may not work like you'd think it would. Maybe Base.collect_similar
needs to be adjusted here, so that it iterates in the correct order?
Shouldn't DiskGenerator
fix that? Maybe it's not being used somehow. I thought we had fixed this 😅
According to @which
at least, it's not being used...
Ah this is #144 again, I hadn't seen that issue
Oh right we didn't actually fix that 😭
I don't have access to YAXArrays on my current system, but I tested with Rasters/DD/DiskArrays latest, and this is indeed working. Probably fixed by #198.