edzer/spacetime

stConstruct does not work with tibbles

Opened this issue · 5 comments

The different subsetting behavior of tibbles seems to break stConstruct. Here's a reprex:

dat_df <- data.frame(
  z = c(35, 42, 49, 47),
  proc = "Tmax",
  lat = c(39.35, 39.35, 40.1, 40.1),
  lon = -81.43,
  date = as.Date(c("1990-01-01", "1990-01-02", "1990-01-01", "1990-01-02"))
)

# This works
stobj <- spacetime::stConstruct(
  x = dat_df,
  space = c("lon", "lat"),
  time = "date"
)
stobj
#> An object of class "STIDF"
#> Slot "data":
#>    z proc
#> 1 35 Tmax
#> 3 49 Tmax
#> 2 42 Tmax
#> 4 47 Tmax
#> 
#> Slot "sp":
#> SpatialPoints:
#>        lat    lon
#> [1,] 39.35 -81.43
#> [2,] 40.10 -81.43
#> [3,] 39.35 -81.43
#> [4,] 40.10 -81.43
#> Coordinate Reference System (CRS) arguments: NA 
#> 
#> Slot "time":
#>            timeIndex
#> 1990-01-01         1
#> 1990-01-01         3
#> 1990-01-02         2
#> 1990-01-02         4
#> 
#> Slot "endTime":
#> [1] "1990-01-01 UTC" "1990-01-01 UTC" "1990-01-02 UTC" "1990-01-02 UTC"

# This does not
dat_tib <- tibble::as_tibble(dat_df)
stobj2 <- spacetime::stConstruct(
  x = dat_tib,
  space = c("lon", "lat"),
  time = "date"
)
#> Error: `i` must have one dimension, not 2.

Created on 2020-08-31 by the reprex package (v0.3.0)

The underlying issue is that data.frame can be subset by an xts object...

x <- dat_tib
ti <- which(names(x) %in% "date")
time <- xts::xts(matrix(1:nrow(x), dimnames = list(NULL, "timeIndex")), x[[ti]])
dat_df[time,]
#    z proc   lat    lon       date
# 1 35 Tmax 39.35 -81.43 1990-01-01
# 3 49 Tmax 40.10 -81.43 1990-01-01
# 2 42 Tmax 39.35 -81.43 1990-01-02
# 4 47 Tmax 40.10 -81.43 1990-01-02

...while a tibble cannot:

dat_tib[time,]
# Error: `i` must have one dimension, not 2.

Digging deeper, this actually seems to be because of different behavior between df[i,] and tib[i,] when i is a matrix. It looks like data frames will silently coerce the matrix into a vector, while tibbles throw an error:

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
imat <- matrix(ncol = 2, byrow = TRUE, c(
  1, 1,
  2, 2
))
iris[imat,]
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1            5.1         3.5          1.4         0.2  setosa
#> 2            4.9         3.0          1.4         0.2  setosa
#> 1.1          5.1         3.5          1.4         0.2  setosa
#> 2.1          4.9         3.0          1.4         0.2  setosa
c(imat)
#> [1] 1 2 1 2
iris[c(imat),]
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1            5.1         3.5          1.4         0.2  setosa
#> 2            4.9         3.0          1.4         0.2  setosa
#> 1.1          5.1         3.5          1.4         0.2  setosa
#> 2.1          4.9         3.0          1.4         0.2  setosa
tibble::as_tibble(iris)[imat,]
#> Error: `i` must have one dimension, not 2.

Created on 2020-08-31 by the reprex package (v0.3.0)

I think the solution might be to convert the xts time object to an integer vector and use that for subsetting:

time
#            timeIndex
# 1990-01-01         1
# 1990-01-01         3
# 1990-01-02         2
# 1990-01-02         4
itime <- as.integer(time)
itime
# [1] 1 3 2 4
dat_df[itime,]
#    z proc   lat    lon       date
# 1 35 Tmax 39.35 -81.43 1990-01-01
# 3 49 Tmax 40.10 -81.43 1990-01-01
# 2 42 Tmax 39.35 -81.43 1990-01-02
# 4 47 Tmax 40.10 -81.43 1990-01-02
dat_tib[itime,]
# # A tibble: 4 x 5
#       z proc    lat   lon date      
#   <dbl> <fct> <dbl> <dbl> <date>    
# 1    35 Tmax   39.4 -81.4 1990-01-01
# 2    49 Tmax   40.1 -81.4 1990-01-01
# 3    42 Tmax   39.4 -81.4 1990-01-02
# 4    47 Tmax   40.1 -81.4 1990-01-02

Here's a full backtrace:

17: vec_as_location(i, n, arg = as_label(i_arg))
16: force(expr)
15: doTryCatch(return(expr), name, parentenv, handler)
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])
13: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch(force(expr), vctrs_error_subscript = function(cnd) {
        cnd$subscript_arg <- i_arg
        cnd$subscript_elt <- "row"
        if (isTRUE(assign) && !isTRUE(cnd$subscript_action %in% c("negate"))) {
            cnd$subscript_action <- "assign"
        }
        cnd_signal(cnd)
    })
11: subclass_row_index_errors(vec_as_location(i, n, arg = as_label(i_arg)), 
        i_arg = i_arg, assign = assign)
10: vectbl_as_row_location(i, nr, i_arg, assign)
9: vectbl_as_row_index(i, x, i_arg)
8: tbl_subset_row(xo, i = i, i_arg)
7: `[.tbl_df`(x, time, -c(si, ti), drop = FALSE)
6: x[time, -c(si, ti), drop = FALSE]
5: initialize(value, ...)
4: initialize(value, ...)
3: new("STIDF", STI(sp, time, endTime), data = data)
2: STIDF(sp[time, ], time, x[time, -c(si, ti), drop = FALSE], endTime)
1: spacetime::stConstruct(x = dat_tib, space = c("lon", "lat"), 
       time = "date")
edzer commented

Thanks for the clear diagnosis! I'd be open for a well tested PR, but active development actually really takes place in package stars now, spacetime is in maintenance mode.

Understood, thanks! I'll put together a PR when I get a chance.

FWIW, I came across spacetime because it features prominently in the recently released book "Spatio-Temporal Statistics With R". Depending on how popular that becomes, you may see a lot of use of this package! Though I'll definitely point people in our reading group to stars (which is awesome, by the way -- I'm a big fan!).

edzer commented

I know, I read it and am a big fan! Yes it is always a pain - writing the software, and then writing books takes years, and chances are that all kind of things changed over those years. I'll get in touch with @andrewzm to see if we can update his book materials at some stage. stars appeared two years ago on CRAN, but it took a while before it was useful!

👍 Feel free to let him know that I and likely others in our discussion group would be willing to help revise some of the labs to reflect current best practices. If he had a public GitHub repository for the lab code, I'd be happy to contribute pull-requests.