FR: Check Excel sheets and consistency with readr
Opened this issue · 0 comments
olivroy commented
Hi,
I was wondering if it would be possible to add a check for Excel sheets?
When I miswrite the Excel sheet, I have to go investigate the Excel file, but I wrote a small helper to help for that with rlang::arg_match()
that gives helpful error message.
Also, would it be possible to rename the path
argument to file
, just like in readr, and .name_repair
to name_repair
since all other arguments in read_excel()
don't start with .
.
Thanks
library(readxl)
read_excel(readxl_example("datasets.xlsx"), sheet = "iris2")
#> Error: Sheet 'iris2' not found
# Helper with consistency with readr
read_excel_check <- function(file, sheet = NULL, name_repair = "unique") {
if (!is.null(sheet) && rlang::is_string(sheet)) {
all_sheets <- readxl::excel_sheets(path = path)
rlang::arg_match(sheet, all_sheets)
}
read_excel(
path = file,
sheet = sheet,
.name_repair = name_repair
)
}
read_excel_check(
readxl_example("datasets.xlsx"),
sheet = "iris2",
name_repair = "universal"
)
#> Error in `read_excel_check()`:
#> ! `sheet` must be one of "iris", "mtcars", "chickwts", or "quakes", not
#> "iris2".
#> ℹ Did you mean "iris"?
Created on 2023-04-14 with reprex v2.0.2
Edit: This addition is costly in terms of performance. For a small datasets, it increases computing time by ~30%, but it reduces when the data is larger, and only check is sheet
is not NULL
bench::mark(
x1 = read_excel(readxl_example("datasets.xlsx"), sheet = "iris"),
x2 = read_excel_check(readxl_example("datasets.xlsx"), sheet = "iris")
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x1 81.3ms 86.1ms 11.3 1.01MB 2.27
#> 2 x2 110.1ms 112.1ms 8.53 227.62KB 0