Alexander-Barth/NCDatasets.jl

Reading In "Scalar" String

Closed this issue · 5 comments

I code a lot but I'm not familiar with programming ontology. So I apologize beforehand if this is trivial. But, I ran into an issue. The best way to demonstrate this is to demonstrate it using both Python and Julia.

First, Python

from netCDF4 import Dataset
import numpy as np

basin = "AL"
ds = Dataset("test-python.nc", "w", format="NETCDF4")
ds.createDimension("time", None)
basin_var = ds.createVariable("basin", "str", dimensions=())
basin_var[:] = np.array(basin)
ds.close()

Reading in and printing the variable using Python

ds = Dataset("test-python.nc", "r")
basin = ds["basin"][:]
print(basin)

Which outputs AL (not in an array).

Replicating this with Julia NCDatasets

using NCDatasets

ds = Dataset("test-julia.nc", "c")
defVar(ds, "basin", ["AL"], ())
close(ds)

The output:

closed NetCDF NCDataset

Using the command in Python above to read this test-julia.nc file outputs AL again. But, if I try to read either test-python.nc or test-julia.nc using Julia

ds = Dataset("test-julia.nc")
println(ds["basin"][:])
close(ds)

I get:

MethodError: no method matching zero(::Type{String})
Closest candidates are:
  zero(::Union{Type{P}, P}) where P<:Period at /mnt/ssd1/naufal/julia-1.7.2/share/julia/stdlib/v1.7/Dates/src/periods.jl:53
  zero(::AbstractIrrational) at /mnt/ssd1/naufal/julia-1.7.2/share/julia/base/irrationals.jl:150
  zero(::T) where T<:TimeType at /mnt/ssd1/naufal/julia-1.7.2/share/julia/stdlib/v1.7/Dates/src/types.jl:450
  ...

Stacktrace:
 [1] getindex(v::NCDatasets.Variable{String, 0, NCDataset{Nothing}}, indexes::Colon)
   @ NCDatasets ~/.julia/packages/NCDatasets/xVEGJ/src/variable.jl:316
 [2] getindex(v::NCDatasets.CFVariable{String, 0, NCDatasets.Variable{String, 0, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, indexes::Colon)
   @ NCDatasets ~/.julia/packages/NCDatasets/xVEGJ/src/cfvariable.jl:736
 [3] top-level scope
   @ In[48]:2
 [4] eval
   @ ./boot.jl:373 [inlined]
 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

Environments:

CentOS Linux 7 (Core)
Julia 1.7.2
NCDatasets 0.12.13
Python 3.9.7
netCDF4 1.5.8
numpy 1.21.5

Selected Jupyter core packages...
IPython : 8.1.1
ipykernel : 6.9.1
ipywidgets : not installed
jupyter_client : 7.1.2
jupyter_core : 4.9.2
jupyter_server : not installed
jupyterlab : not installed
nbclient : 0.5.11
nbconvert : 6.4.2
nbformat : 5.1.3
notebook : 6.4.8
qtconsole : not installed
traitlets : 5.1.1

Thank you for the detailed and reproducible bug report!

In the current version ds["basin"][] should already work but with commit 4ed2146 your example does work too.

ds = Dataset("test-julia.nc")
println(ds["basin"][:])
# output AL

Can you confirm that this solves the issue for you?

Ah! I wasn't aware that [] was an option. I must've missed that. Sorry! But yes, that commit works! Thank you!

I think this commit seems to have broken something else.

using NCDatasets

basin = ["AL"]
ds = Dataset("./test-julia.nc", "c")
defDim(ds, "time", 1)
defVar(ds, "basin", basin, ("time",), deflatelevel=1)
close(ds)

Before the commit, the deflatelevel arg in the snippet above would work. But after the commit, this is the error message that I got

NetCDF error: NetCDF: Filter error: bad id or parameters or duplicate filter (NetCDF error code: -132)

Stacktrace:
 [1] check
   @ ~/.julia/packages/NCDatasets/Frbu0/src/errorhandling.jl:25 [inlined]
 [2] nc_def_var_deflate(ncid::Int32, varid::Int32, shuffle::Bool, deflate::Bool, deflate_level::Int64)
   @ NCDatasets ~/.julia/packages/NCDatasets/Frbu0/src/netcdf_c.jl:1037
 [3] defVar(ds::NCDataset{Nothing}, name::String, vtype::DataType, dimnames::Tuple{String}; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:attrib, :deflatelevel), Tuple{Vector{Any}, Int64}}})
   @ NCDatasets ~/.julia/packages/NCDatasets/Frbu0/src/cfvariable.jl:146
 [4] _defVar(ds::NCDataset{Nothing}, name::String, data::Vector{String}, nctype::Type, dimnames::Tuple{String}; attrib::Vector{Any}, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:deflatelevel,), Tuple{Int64}}})
   @ NCDatasets ~/.julia/packages/NCDatasets/Frbu0/src/cfvariable.jl:245
 [5] #defVar#43
   @ ~/.julia/packages/NCDatasets/Frbu0/src/cfvariable.jl:198 [inlined]
 [6] top-level scope
   @ In[3]:4
 [7] eval
   @ ./boot.jl:373 [inlined]
 [8] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

Could it be this issue?

#186 (comment)
Unidata/netcdf-c#2480 (comment)

Maybe you upgraded the the version of the NetCDF C library when updating NCDatasets.

See also https://alexander-barth.github.io/NCDatasets.jl/stable/variables/#NCDatasets.deflate:

chunksizes, deflatelevel, shuffle and checksum can only be set on NetCDF 4 files. Compression of strings and variable-length arrays is not supported by the underlying NetCDF library.

Hmm... That might be it. Thanks for the info!