r-lib/pillar

`sf` objects error on print if `sf` is not loaded

mikemahoney218 opened this issue · 5 comments

Apologies if I'm in the wrong spot for this issue!

If we create any sf object with classes "sf", "tbl_df", "tbl", "data.frame" and save it via save:

library(sf)
x <- read_sf(system.file("shape", "nc.shp", package = "sf", mustWork = TRUE))
save(x, file = "x.rda")

And then load in a new R session without loading sf, the geometry column often causes an error when printing:

# Load an sf object without loading sf:
load("x.rda")
# Printing without any processing works, but doesn't print nicely:
x
#> (output snipped)
# Printing after processing fails:
dplyr::mutate(x, y = 1)
#> Error in `vec_size()`:
#> ! `x` must be a vector, not a <sfc_MULTIPOLYGON/sfc> object.
#> Run `rlang::last_error()` to see where the error occurred.
# This is "sticky", as printing now _always_ fails:
x
#> Error in `vec_size()`:
#> ! `x` must be a vector, not a <sfc_MULTIPOLYGON/sfc> object.
#> Run `rlang::last_error()` to see where the error occurred.
# Run any sf function to load the package:
sf::sf_extSoftVersion()
# Printing now works
dplyr::mutate(x, y = 1)
#> Simple feature collection with 100 features and 15 fields
#> (output snipped)

The traceback looks like this:

<error/vctrs_error_scalar_type>
Error in `vec_size()`:
! `x` must be a vector, not a <sfc_MULTIPOLYGON/sfc> object.
---
Backtrace:1. ├─base `<fn>`(x)
  2. ├─pillar:::print.tbl(x)
  3. │ └─pillar:::print_tbl(...)
  4. │   ├─base::writeLines(...)
  5. │   ├─base::format(...)
  6. │   └─pillar:::format.tbl(...)
  7. │     └─pillar:::format_tbl(...)
  8. │       └─pillar::tbl_format_setup(...)
  9. │         ├─pillar:::tbl_format_setup_dispatch(...)
 10. │         └─pillar:::tbl_format_setup.tbl(...)
 11. │           └─pillar:::df_head(x, n)
 12. │             ├─pillar:::vec_head(as.data.frame(x), n)
 13. │             │ └─vctrs::vec_size(x)
 14. │             ├─base::as.data.frame(x)
 15. │             └─tibble:::as.data.frame.tbl_df(x)
 16. │               ├─base::`[<-`(`*tmp*`, unname, value = `<named list>`)
 17. │               └─tibble:::`[<-.tbl_df`(`*tmp*`, unname, value = `<named list>`)
 18. │                 └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
 19. │                   └─tibble:::vectbl_recycle_rhs_rows(...)
 20. │                     ├─base::withCallingHandlers(...)
 21. │                     └─vctrs::vec_recycle(value[[j]], nrow)
 22. └─vctrs:::stop_scalar_type(`<fn>`(`<s_MULTIP>`), "x", `<fn>`(vec_size()))
 23.   └─vctrs:::stop_vctrs(...)
 24.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))

This is specific to the tibble print method; the data.frame print method doesn't error:

library(sf)
y <- read_sf(system.file("shape", "nc.shp", package = "sf", mustWork = TRUE))
y <- as.data.frame(y)
y <- st_as_sf(y)

save(y, file = "y.rda")

After restarting R:

# Load an sf object without loading sf:
load("y.rda")
# This doesn't error (though it doesn't print nicely either)
dplyr::mutate(y, z = 1)

Thanks. It looks like we fail on computing the vec_size() of the sf object if sf is not loaded:

sf <- structure(list(a = 3, g = structure(list(structure(1:2, class = c(
  "XY",
  "POINT", "sfg"
))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(
  xmin = 1,
  ymin = 2, xmax = 1, ymax = 2
), class = "bbox"), crs = structure(list(
  input = NA_character_, wkt = NA_character_
), class = "crs"), n_empty = 0L)), row.names = 1L, class = c(
  "sf",
  "data.frame"
), sf_column = "g", agr = structure(c(a = NA_integer_), class = "factor", .Label = c(
  "constant",
  "aggregate", "identity"
)))

vctrs::vec_size(sf)
#> [1] 1
vctrs::vec_size(sf$a)
#> [1] 1
vctrs::vec_size(sf$g)
#> Error in `vctrs::vec_size()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.

library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1; sf_use_s2() is TRUE
vctrs::vec_size(sf$g)
#> [1] 1

Created on 2022-06-10 by the reprex package (v2.0.1)

A fix on the sf end could be to add an explicit "list" class to the sfc class:

sf <- structure(list(a = 3, g = structure(list(structure(1:2, class = c(
  "XY",
  "POINT", "sfg"
))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(
  xmin = 1,
  ymin = 2, xmax = 1, ymax = 2
), class = "bbox"), crs = structure(list(
  input = NA_character_, wkt = NA_character_
), class = "crs"), n_empty = 0L)), row.names = 1L, class = c(
  "sf",
  "data.frame"
), sf_column = "g", agr = structure(c(a = NA_integer_), class = "factor", .Label = c(
  "constant",
  "aggregate", "identity"
)))

class(sf$g) <- c(class(sf$g), "list")

vctrs::vec_size(sf$g)
#> [1] 1

Created on 2022-06-10 by the reprex package (v2.0.1)

What is the original use case, or motivation, for printing without loading the package?

I ran into this originally when adding an sf object as data to a package (in tidymodels/spatialsample#33 ). Because the package didn't import sf but rather referred to functions via ::, this meant that anything that would print the data before processing it -- which we did in documentation and in tests -- would error. The easy fix was to just @import sf, but it took a second to figure that out given the error and traceback surfaced.

Thanks. I think this is the right thing to do. Closing, we can't really do much here.

I did just realize this means you can't print data from sf objects loaded via data without loading the package as well:

data(boston_canopy, package = "spatialsample")
boston_canopy
#> Error in `vec_size()`:
#> ! `x` must be a vector, not a <sfc_MULTIPOLYGON/sfc> object.

Created on 2022-06-13 by the reprex package (v2.0.1)

Still not a hard thing to work around (just load the package), but probably the more common place this occurs.

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.