64-bit offset NetCDF and unlimited dimensions
Closed this issue · 8 comments
This code with 64-bit offset NetCDF files and unlimited dimensions are failing with current master:
using NCDatasets
using Test
sz = (4,5,3)
filename = tempname()
NCDataset(filename,"c",format=:netcdf3_64bit_offset) do ds
ds.dim["lon"] = sz[1]
ds.dim["lat"] = sz[2]
ds.dim["time"] = Inf
T = Float32
v = defVar(ds,"var-$T",T,("lon","lat","time"))
j = 1
v[:,:,j] = fill(T(j), sz[1:2])
end
This is the stack-trace:
julia> include("/home/abarth/.julia/dev/NCDatasets/test/test_variable_unlim2.jl");
ERROR: LoadError: ArgumentError: Chunk sizes must be strictly positive
Stacktrace:
[1] RegularChunks
@ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:26 [inlined]
[2] #24
@ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:133 [inlined]
[3] map (repeats 3 times)
@ ./tuple.jl:318 [inlined]
[4] DiskArrays.GridChunks(a::Tuple{Int64, Int64, Int64}, chunksize::Tuple{Int64, Int64, Int64}; offset::Tuple{Int64, Int64, Int64})
@ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:132
[5] GridChunks
@ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:131 [inlined]
[6] #GridChunks#18
@ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:130 [inlined]
[7] DiskArrays.GridChunks(a::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, chunksize::Tuple{Int64, Int64, Int64})
@ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:130
[8] eachchunk(v::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}})
@ NCDatasets ~/.julia/dev/NCDatasets/src/variable.jl:407
[9] _writeblock!(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, ::Array{Float32, 3}, ::Base.OneTo{Int64}, ::Vararg{AbstractVector})
@ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/batchgetindex.jl:187
[10] writeblock!(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, ::Array{Float32, 3}, ::Base.OneTo{Int64}, ::Vararg{AbstractVector})
@ NCDatasets ~/.julia/packages/DiskArrays/QlfRF/src/batchgetindex.jl:213
[11] setindex_disk!(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, ::Matrix{Float32}, ::Function, ::Vararg{Any})
@ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/diskarray.jl:57
[12] setindex!
@ ~/.julia/packages/DiskArrays/QlfRF/src/diskarray.jl:229 [inlined]
[13] setindex!(::CommonDataModel.CFVariable{Float32, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Vararg{Nothing, 5}}}}, ::Matrix{Float32}, ::Colon, ::Colon, ::Int64)
@ CommonDataModel ~/.julia/dev/CommonDataModel/src/cfvariable.jl:419
[14] (::var"#29#30")(ds::NCDataset{Nothing})
@ Main ~/.julia/dev/NCDatasets/test/test_variable_unlim2.jl:18
[15] NCDataset(::var"#29#30", ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Symbol, Tuple{Symbol}, NamedTuple{(:format,), Tuple{Symbol}}})
@ NCDatasets ~/.julia/dev/NCDatasets/src/dataset.jl:255
[16] top-level scope
@ ~/.julia/dev/NCDatasets/test/test_variable_unlim2.jl:7
[17] include(fname::String)
@ Base.MainInclude ./client.jl:478
[18] top-level scope
@ REPL[36]:1
in expression starting at /home/abarth/.julia/dev/NCDatasets/test/test_variable_unlim2.jl:7
I am using DiskArray 0.3.22. Note 64-bit offset NetCDF are always unchunked. The error does not appear with NetCDF4 files or with NCDatasets 0.12.
@tcarion or @rafaqz , do you have any ideas ?
My tentative reproduce with NetCDF.jl fails with a different error:
import NetCDF
fname2 = tempname()
time_dim = NetCDF.NcDim("time",Int[],unlimited=true);
lon_dim = NetCDF.NcDim("lon",1:3);
v = NetCDF.NcVar("obs",[lon_dim,time_dim]);
NetCDF.create(fname2,v; mode=NetCDF.NC_64BIT_OFFSET) do nc
nc["obs"][:,1:3] = rand(3,3)
end;
which produces Not a valid data type or _FillValue type mismatch
.
Hmm
@meggart recently added a check to avoid broken chunks (zero or negative size)
This is explicitly erroring now instead of allowing an object with broken chunks, it is good to have that guarantee in most cases except yours, where you are trying to break the chunks!
I have warned that the chunks would break if you used setindex!
like this ;)
But I don't know what to do besides removing the safety check and let you have broken chunks. I do have a simple solution that will take a few hours to implement, but you know all about that...
I think the problem of this issue here is not that difficult.
For a NetCDF 3 file, the NetCDF library reports a file always a unchunked and the chunk size defaults to the initial array size (here for example (4,5,0)). But technically, for variables with unlimited dimensions the data is actually chunked for NetCDF3 (see e.g. https://www.unidata.ucar.edu/software/netcdf/workshops/2007/performance/FileFormat.html)
Simply replacing any zero in the chunk size by one (in NCDatasets) gives the actual chunk size on disk and may solve this issue .
See: https://github.com/Alexander-Barth/NCDatasets.jl/pull/232/files
Well seems slightly hacky, but if it works!
Yes, it seems quite hacky. I am wondering if the netcdf library is actually corrected to report variables with unlimited dimensions as continuous while they are saved in a non-contiguous manner on disk.
Is it possible to check for unlimited variables manually when you load a file, and effectively ignore what netcdf reports?
yes, this is certainly possible but it could cause some confusion. Maybe NCDatasets.chunking
should just report what nc_inq_var_chunking
says, but for DiskArrays (and NCDatasets.getchunksize
) we would use the chunk size that corresponds to the contiguously data blocks on file (for performance).
Yes that makes sense, the effective chunk pattern is different to the official chunk pattern.
Resolved in a0ca027 .