nc_axis row order

Based on this: https://github.com/hypertidy/ncmeta/blob/master/R/nc_axes.R#L46 the row order of the response from nc_axis is in the order of the RNetCDF::var.inq.nc()$dimid vector. Is this by design? I ask because that $dimid for a variable determines what order you put dimensions in start and count for RNetCDF::var.get.nc() and it's pretty important to get right!!

I wonder if adding a list column to the nc_var() response that would contain the $dimid vector would be safer? The fact that the row order corresponds dimension order is being used here: https://github.com/r-spatial/stars/blob/master/R/ncdf.R#L58 already so at the least there should be some testing to make sure that the row order doesn't get mucked up?

var <- "ppt"
h_nc <- RNetCDF::open.nc("https://cida.usgs.gov/thredds/dodsC/prism_v2")

dplyr::filter(nc_axes(h), variable == var)$dimension
# [1] 1 0 3

RNetCDF::var.inq.nc(h_nc, var)$dimids
# [1] 1 0 3

I think it's correct this way. I'm not sure about by design, but I definitely checked these things early on.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ncmeta)
h_nc <- RNetCDF::open.nc("https://cida.usgs.gov/thredds/dodsC/prism_v2")

nc_axes(h_nc, variables = "ppt") %>% 
 dplyr::left_join(nc_dims(h_nc), c("dimension" = "id"))
#> # A tibble: 3 x 6
#>    axis variable dimension name  length unlim
#>   <int> <chr>        <dbl> <chr>  <dbl> <lgl>
#> 1     1 ppt              1 lon     1405 FALSE
#> 2     2 ppt              0 lat      621 FALSE
#> 3     3 ppt              3 time    1476 FALSE

^{Created on 2019-03-20 by the reprex package (v0.2.1)}

Is row order of the nc_axes tibble enough? I'm cool if we want to go with that convention. It should be advertised in docs and tested at the least. I can take a crack at that when I look at nc_coord_var() too.

This the logic of axes:

f <- "https://cida.usgs.gov/thredds/dodsC/prism_v2"
x <- tidync(f)

x

Data Source (1): prism_v2 ...

Grids (5) <dimension family> : <associated variables> 

[1]   D1,D0,D3 : ppt, tmx, tmn    **ACTIVE GRID** ( 1287817380  values per variable)
[2]   D2,D3    : time_bnds
[3]   D0       : lat
[4]   D1       : lon
[5]   D3       : time

Dimensions 4 (3 active): 
  
  dim   name  length     min      max start count    dmin     dmax unlim coord_dim 
  <chr> <chr>  <dbl>   <dbl>    <dbl> <int> <int>   <dbl>    <dbl> <lgl> <lgl>     
1 D0    lat      621    24.1     49.9     1   621    24.1     49.9 FALSE TRUE      
2 D1    lon     1405  -125.     -66.5     1  1405  -125.     -66.5 FALSE TRUE      
3 D3    time    1476  9131    54025       1  1476  9131    54025   FALSE TRUE      
  
Inactive dimensions:
  
  dim   name  length   min   max unlim coord_dim 
  <chr> <chr>  <dbl> <dbl> <dbl> <lgl> <lgl>     
1 D2    tbnd       2     1     2 FALSE FALSE

See how the variables are D1, D0, D3, but the dimensions are in order of their ID. (I think that table of active dimensions should be in D1, D0, D3 order too - something to fix, but I want it to not be all cat/print anyway).

From the perspective of ncmeta,

we get all dimensions in source order

ncmeta::nc_dims(f)
# A tibble: 4 x 4
     id name  length unlim
  <dbl> <chr>  <dbl> <lgl>
1     0 lat      621 FALSE
2     1 lon     1405 FALSE
3     2 tbnd       2 FALSE
4     3 time    1476 FALSE

We get axes (instances of dimensions) for variables in coordinate order, so axis here is the 1, 2, 3 order.

ncmeta::nc_axes(f, "ppt")
# A tibble: 3 x 3
   axis variable dimension
  <int> <chr>        <dbl>
1     1 ppt              1
2     2 ppt              0
3     3 ppt              3

Loosely, the dimension id matches id in the dimension table - that's something to formalize, and I'm not sure how consistent I am. In places I use "name" on the entity table, not "variable" or "dimension", but there "axis" is not a name but an ordering, so ...

Frankly, I have gone around in circles a bit with this but somehow got it right last time. It takes me a while to get my head into it again - definitely needs to be pinned down with tests, so we can clean up the code somewhat.

There was a stage when I was unpacking the dimension strings "D1,D0,D3" but that's bad news, I really think axis is the right concept, and in context of a variable they are ordered (and labelled with an order attribute axis).

It also holds for all axes, but here the axis order label is absolute (not sure if that's good or bad):

ncmeta::nc_axes(f)
# A tibble: 14 x 3
    axis variable  dimension
   <int> <chr>         <dbl>
 1     1 lon               1
 2     2 lat               0
 3     3 time              3
 4     4 time_bnds         2
 5     5 time_bnds         3
 6     6 ppt               1
 7     7 ppt               0
 8     8 ppt               3
 9     9 tmx               1
10    10 tmx               0
11    11 tmx               3
12    12 tmn               1
13    13 tmn               0
14    14 tmn               3

Yeah -- I see what's going on now. You get the dimension order right in rows because of this:

ncmeta/R/nc_axes.R

Line 59 in cea1b14

    
           if (longest > 1L) out <- lapply(out, function(a) rep_len(a, length.out = longest))

This:

Browse[2]> out
$id
[1] 4

$name
[1] "ppt"

$type
[1] "NC_INT"

$ndims
[1] 3

$dimids
[1] 1 0 3

$natts
[1] 6

becomes:

Browse[2]> out
$id
[1] 4 4 4

$name
[1] "ppt" "ppt" "ppt"

$type
[1] "NC_INT" "NC_INT" "NC_INT"

$ndims
[1] 3 3 3

$dimids
[1] 1 0 3

$natts
[1] 6 6 6

Since dimids is the only part of the var.inq.nc that varies, the row order of the resulting tibble has the right order. It always will too.

My feeling is that it would be as good or better to put the dimid vector into a list column for each variable and drop the whole axis thing -- but maybe I'm missing the utility of the axis construct?

Happy to start writing some code around this way of doing things and just be aware of things being sorted in "axis" order too.

> s <- "https://cida.usgs.gov/thredds/dodsC/stageiv_combined"
> nc_axes(s)
# A tibble: 10 x 3
    axis variable                                        dimension
   <int> <chr>                                               <dbl>
 1     1 Total_precipitation_surface_1_Hour_Accumulation         2
 2     2 Total_precipitation_surface_1_Hour_Accumulation         3
 3     3 Total_precipitation_surface_1_Hour_Accumulation         0
 4     4 time_bounds                                             1
 5     5 time_bounds                                             0
 6     6 lon                                                     3
 7     7 lon                                                     2
 8     8 lat                                                     3
 9     9 lat                                                     2
10    10 time                                                    0