netcdf dimensions get messed up if they are out of order and if axis attributes are available
Opened this issue ยท 12 comments
Function .get_dims()
sorts dimensions according to the canonical axis order if "axis" attributes are provided by the netcdf.
However, this doesn't seem to work as expected if the dimensions in the netcdf are not already sorted. This seems to be caused by line 521 of file R/ncdf.R (https://github.com/r-spatial/stars/blob/f0156c9266ba16a8c4efc8da9702f053e98c871a/R/ncdf.R#L521C5-L521C48) which uses
`dims` <- dims[match(dims$id, dim_matcher), ]`
instead of
`dims <- dims[match(dim_matcher, dims$id, nomatch = 0L), ]`
(or similar).
It would be great if stars could read such netcdfs, thanks!
packageVersion("stars")
#> [1] '0.6.5'
packageVersion("ncmeta")
#> [1] '0.4.0'
packageVersion("sf")
#> [1] '1.0.16'
# Demonstration
dim_matcher <- c(1L, 0L, 3L)
dims <- data.frame(
id = c(3L, 1L, 0L),
name = c("time", "lon", "lat"),
length = c(9, 2, 3),
unlim = c(FALSE, FALSE, FALSE),
coord_dim = c(TRUE, TRUE, TRUE),
coord_var = c("time", "lon", "lat"),
axis = c("T", "X", "Y")
)
# Existing implementation: axis order ends up as Y - T - X
dims[match(dims$id, dim_matcher), ]
#> id name length unlim coord_dim coord_var axis
#> 3 0 lat 3 FALSE TRUE lat Y
#> 1 3 time 9 FALSE TRUE time T
#> 2 1 lon 2 FALSE TRUE lon X
# Suggested fix: axis order is now X - Y - T
dims[match(dim_matcher, dims$id, nomatch = 0L), ]
#> id name length unlim coord_dim coord_var axis
#> 2 1 lon 2 FALSE TRUE lon X
#> 3 0 lat 3 FALSE TRUE lat Y
#> 1 3 time 9 FALSE TRUE time T
# Example with two netCDFs
tas <- array(
data = rowSums(
expand.grid(seq_len(9), 10 * (seq_len(2) - 1), 100 * (seq_len(3) - 1))
),
dim = c(9, 2, 3)
)
# File1 has no "axis" attributes
file1 <- tempfile("tas_example_", fileext = ".nc")
nc <- RNetCDF::create.nc(file1)
id_lat <- RNetCDF::dim.def.nc(nc, "lat", 3)
iv_lat <- RNetCDF::var.def.nc(nc, "lat", "NC_FLOAT", id_lat)
RNetCDF::var.put.nc(nc, "lat", c(40, 45, 50))
id_lon <- RNetCDF::dim.def.nc(nc, "lon", 2)
iv_lon <- RNetCDF::var.def.nc(nc, "lon", "NC_FLOAT", id_lon)
RNetCDF::var.put.nc(nc, "lon", c(-100, -95))
id_bnds <- RNetCDF::dim.def.nc(nc, "bnds", 2)
id_time <- RNetCDF::dim.def.nc(nc, "time", 9)
iv_time <- RNetCDF::var.def.nc(nc, "time", "NC_INT", id_time)
RNetCDF::var.put.nc(nc, "time", 1:9)
iv_tas <- RNetCDF::var.def.nc(nc, "temperature", "NC_FLOAT", c(id_time, id_lon, id_lat))
RNetCDF::var.put.nc(nc, "temperature", tas)
RNetCDF::close.nc(nc)
# File2 has X, Y, T axis attributes
file2 <- tempfile("tas_example_", fileext = ".nc")
file.copy(from = file1, to = file2)
#> [1] TRUE
nc <- RNetCDF::open.nc(file2, write = TRUE)
RNetCDF::att.put.nc(nc, "lon", "axis", "NC_CHAR", "X")
RNetCDF::att.put.nc(nc, "lat", "axis", "NC_CHAR", "Y")
RNetCDF::att.put.nc(nc, "time", "axis", "NC_CHAR", "T")
RNetCDF::close.nc(nc)
# Read example files with `read_ncdf()`
stars::read_ncdf(file1) # Works well (no "axis" attributes -- `dim_matcher` is all NA)
#> no 'var' specified, using temperature
#> other available variables:
#> lat, lon, time
#> Will return stars object with 54 cells.
#> Warning in .get_nc_projection(meta$attribute, rep_var, cv): No projection
#> information found in nc file.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> temperature 1 15.25 110 110 204.75 219
#> dimension(s):
#> from to offset delta x/y
#> time 1 9 0.5 1 [x]
#> lon 1 2 -102.5 5 [y]
#> lat 1 3 37.5 5
stars::read_ncdf(file2) # Doesn't work as expected: dimensions are messed up
#> no 'var' specified, using temperature
#> other available variables:
#> lat, lon, time
#> Will return stars object with 54 cells.
#> Warning in .get_nc_time(dimensions, make_time, coord_var, rep_var, meta):
#> ignoring units of time dimension, not able to interpret
#> Warning in .get_nc_time(dimensions, make_time, coord_var, rep_var, meta): No
#> projection information found in nc file.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> temperature 1 15.25 110 110 204.75 219
#> dimension(s):
#> from to offset delta x/y
#> lat 1 9 37.5 5 [x]
#> time 1 2 1 1 [y]
#> lon 1 3 -102.5 5
# Clean up
unlink(file1)
unlink(file2)
Created on 2024-04-17 with reprex v2.1.0
Nice ;-)
We also have:
(x1 = stars::read_ncdf(file2))
# no 'var' specified, using temperature
# other available variables:
# lat, lon, time
# Will return stars object with 54 cells.
# stars object with 3 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature 1 15.25 110 110 204.75 219
# dimension(s):
# from to offset delta x/y
# lat 1 9 37.5 5 [x]
# time 1 2 1 1 [y]
# lon 1 3 -102.5 5
# Warning messages:
# 1: In .get_nc_time(dimensions, make_time, coord_var, rep_var, meta) :
# ignoring units of time dimension, not able to interpret
# 2: In .get_nc_projection(meta$attribute, rep_var, cv) :
# No projection information found in nc file.
(x2 = stars::read_stars(file2))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# tas_example_5cbe857ee56c4.nc 1 15.25 110 110 204.75 219
# dimension(s):
# from to offset delta x/y
# x 1 9 0 1 [x]
# y 1 2 2 -1 [y]
# lat 1 3 40 5
(x3 = stars::read_mdim(file2))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature 1 15.25 110 110 204.75 219
# dimension(s):
# from to offset delta x/y
# time 1 9 0.5 1 [x]
# lon 1 2 -102.5 5 [y]
# lat 1 3 40 5
identical(as.vector(x1[[1]]), as.vector(x2[[1]]))
# [1] FALSE
identical(as.vector(x1[[1]]), as.vector(x3[[1]]))
# [1] TRUE
But also
identical(as.vector(x1[[1]]), as.vector(x2[[1]][,2:1,]))
# [1] TRUE
so that is because GDAL reverts the y axis direction; and
stars::read_mdim(file2, raster = c("lon", "lat"))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature 1 15.25 110 110 204.75 219
# dimension(s):
# from to offset delta x/y
# time 1 9 0.5 1
# lon 1 2 -102.5 5 [x]
# lat 1 3 40 5 [y]
where offset is ok for lon, but not for time and lat.
@mdsumner or @dblodgett-usgs could one of you take a look at the .get_dims()
issue?
I think this is all good -- thanks for the thorough report @dschlaep !! I incorporated your reprex into a test and made the change.
Question for @edzer I'm looking at the print.dimensions
method and tracked to here: https://github.com/r-spatial/stars/blob/main/R/dimensions.R#L658
For the base example (without the axis attribute) we have:
netcdf tas_example_4ed43eafb7d {
dimensions:
lat = 3 ;
lon = 2 ;
bnds = 2 ;
time = 9 ;
variables:
float lat(lat) ;
float lon(lon) ;
int time(time) ;
float temperature(lat, lon, time) ;
data:
lat = 40, 45, 50 ;
lon = -100, -95 ;
time = 1, 2, 3, 4, 5, 6, 7, 8, 9 ;
temperature =
1, 2, 3, 4, 5, 6, 7, 8, 9,
11, 12, 13, 14, 15, 16, 17, 18, 19,
101, 102, 103, 104, 105, 106, 107, 108, 109,
111, 112, 113, 114, 115, 116, 117, 118, 119,
201, 202, 203, 204, 205, 206, 207, 208, 209,
211, 212, 213, 214, 215, 216, 217, 218, 219 ;
}
In this example, there's nothing to go on to figure out that the 'lat' and 'lon' variables should be interpreted as latitude and longitude, so the [x]
and [y]
labels aren't right. If I add standard_name and units it still gets the axis labels wrong.
netcdf tas_example_4ed4569a6c6 {
dimensions:
lat = 3 ;
lon = 2 ;
bnds = 2 ;
time = 9 ;
variables:
float lat(lat) ;
lat:standard_name = "latidude" ;
lat:units = "degrees" ;
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:units = "degrees" ;
int time(time) ;
time:standard_name = "time" ;
time:units = "days since 1900-01-01" ;
float temperature(lat, lon, time) ;
data:
lat = 40, 45, 50 ;
lon = -100, -95 ;
time = 1, 2, 3, 4, 5, 6, 7, 8, 9 ;
temperature =
1, 2, 3, 4, 5, 6, 7, 8, 9,
11, 12, 13, 14, 15, 16, 17, 18, 19,
101, 102, 103, 104, 105, 106, 107, 108, 109,
111, 112, 113, 114, 115, 116, 117, 118, 119,
201, 202, 203, 204, 205, 206, 207, 208, 209,
211, 212, 213, 214, 215, 216, 217, 218, 219 ;
}
I see:
stars object with 3 dimensions and 1 attribute
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max.
temperature 1 15.25 110 110 204.75 219
dimension(s):
from to offset delta x/y
time 1 9 1900-01-02 UTC 1 days [x]
lon 1 2 -102.5 5 [y]
lat 1 3 37.5 5
Should the dimension order be returning in x, y, t order here since we know what the order is? I'm don't remember how many places in stars
this assumption is being made vs doing a little more introspection to figure out what the x and y axes are.
@dblodgett-usgs thanks!
If I add standard_name and units it still gets the axis labels wrong.
If you change latidude
into latitude
, we get
> stars::read_ncdf("x.nc")
no 'var' specified, using temperature
other available variables:
lat, lon, time
Will return stars object with 54 cells.
No projection information found in nc file.
Coordinate variable units found to be degrees,
assuming WGS84 Lat/Lon.
stars object with 3 dimensions and 1 attribute
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max.
temperature 1 15.25 110 110 204.75 219
dimension(s):
from to offset delta refsys x/y
lon 1 3 -102.5 5 WGS 84 (CRS84) [x]
lat 1 9 37.5 5 WGS 84 (CRS84) [y]
time 1 2 1900-01-02 UTC 1 days POSIXct
which gets the names right, but the dimensions wrong (lon
should be 2, lat
3, time
9).
This adds some dim name matching to read_mdim()
:
stars::read_mdim("x.nc")
# stars object with 3 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature 1 15.25 110 110 204.75 219
# dimension(s):
# from to offset delta refsys x/y
# time 1 9 1900-01-02 1 days Date
# lon 1 2 -102.5 5 WGS 84 (CRS84) [x]
# lat 1 3 40 5 WGS 84 (CRS84) [y]
stars::read_mdim("x.nc") |> aperm(c(2,3,1))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# temperature 1 15.25 110 110 204.75 219
# dimension(s):
# from to offset delta refsys x/y
# lon 1 2 -102.5 5 WGS 84 (CRS84) [x]
# lat 1 3 40 5 WGS 84 (CRS84) [y]
# time 1 9 1900-01-02 1 days Date
@edzer -- see one more minor PR updating the test.
Still, read_ncdf
gets the dimensions wrong:
from to offset delta refsys x/y
lon 1 3 -102.5 5 WGS 84 (CRS84) [x]
lat 1 9 37.5 5 WGS 84 (CRS84) [y]
time 1 2 1900-01-02 UTC 1 days POSIXct
Sorry, what do you see that's wrong there?
time
should range from 1 to 9, lat
from 1 to 3, lon
from 1 to 2.
(note to self: read_mdim()
gets the lat
offset wrong)
I see -- ok. I'll get in and see where that went wrong as soon as I can.