Should `as_duckplyr_df()` work with tibbles from `readr::read_csv()`?
Opened this issue · 2 comments
andreranza commented
suppressPackageStartupMessages(library(duckplyr))
df1 <- tibble::tibble(col = "A")
temp_file <- tempfile(fileext = ".csv")
readr::write_csv(df1, temp_file)
# tibble from a csv
df_duck_tib <- duckplyr_df_from_file(
temp_file,
table_function = "read_csv_auto",
class = class(tibble::tibble())
)
class(df_duck_tib)
#> [1] "duckplyr_df" "tbl_df" "tbl" "data.frame"
# or, data.frame from csv:
df_duck <- duckplyr_df_from_file(temp_file, table_function = "read_csv_auto")
class(df_duck)
#> [1] "duckplyr_df" "data.frame"
# however, fails due to `spec_tbl_df` attached by readr
spec_tbl_df <- readr::read_csv(temp_file, show_col_types = FALSE)
stopifnot("spec_tbl_df" %in% class(spec_tbl_df))
try(as_duckplyr_df(spec_tbl_df))
#> Error in as_duckplyr_df(spec_tbl_df) :
#> Must pass a plain data frame or a tibble to `as_duckplyr_df()`.
# stripping away `spec_tbl_df`
class(spec_tbl_df) <- c("tbl_df", "tbl", "data.frame")
as_duckplyr_df(spec_tbl_df)
#> # A tibble: 1 × 1
#> col
#> <chr>
#> 1 A
Created on 2024-03-15 with reprex v2.1.0
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.2 (2023-10-31)
#> os macOS Sonoma 14.3.1
#> system x86_64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Rome
#> date 2024-03-15
#> pandoc 3.1.8 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
#> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.0)
#> collections 0.3.7 2023-01-05 [1] CRAN (R 4.3.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
#> DBI 1.2.2 2024-02-16 [1] CRAN (R 4.3.2)
#> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.0)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.0)
#> duckdb 0.9.2-1 2023-11-28 [1] CRAN (R 4.3.0)
#> duckplyr * 0.3.1 2024-03-10 [1] CRAN (R 4.3.2)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.0)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.0)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
#> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.0)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0)
#> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2)
#> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> readr 2.1.5 2024-01-10 [1] CRAN (R 4.3.0)
#> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.0)
#> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.0)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.0)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.0)
#> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.0)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.0)
#> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.2)
#> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.0)
#>
#> ──────────────────────────────────────────────────────────────────────────────
krlmlr commented
Not sure, but we can make the error mesage nicer, mentioning that the user might need calling as_tibble()
or as.data.frame()
.
nikostr commented
I noticed a similar issue for grouped data frames. Maybe it would make sense to also put the current class in error message? Something like
Expected class "data.frame" or class "tbl_df" "tbl" "data.frame" but got class "grouped_df" "tbl_df" "tbl" "data.frame"
or some nicer variation of this?