tidyverse/readxl

FR: Check Excel sheets and consistency with readr

Opened this issue · 0 comments

Hi,

I was wondering if it would be possible to add a check for Excel sheets?

When I miswrite the Excel sheet, I have to go investigate the Excel file, but I wrote a small helper to help for that with rlang::arg_match() that gives helpful error message.

Also, would it be possible to rename the path argument to file, just like in readr, and .name_repair to name_repair since all other arguments in read_excel() don't start with ..

Thanks

library(readxl)

read_excel(readxl_example("datasets.xlsx"), sheet = "iris2")
#> Error: Sheet 'iris2' not found


# Helper with consistency with readr
read_excel_check <- function(file, sheet = NULL, name_repair = "unique") {
  if (!is.null(sheet) && rlang::is_string(sheet)) {
      all_sheets <- readxl::excel_sheets(path = path)
      rlang::arg_match(sheet, all_sheets)
    }

  read_excel(
    path = file,
    sheet = sheet,
    .name_repair = name_repair
  )
}
read_excel_check(
  readxl_example("datasets.xlsx"),
  sheet = "iris2",
  name_repair = "universal"
  )
#> Error in `read_excel_check()`:
#> ! `sheet` must be one of "iris", "mtcars", "chickwts", or "quakes", not
#>   "iris2".
#> ℹ Did you mean "iris"?

Created on 2023-04-14 with reprex v2.0.2

Edit: This addition is costly in terms of performance. For a small datasets, it increases computing time by ~30%, but it reduces when the data is larger, and only check is sheet is not NULL

bench::mark(
  x1 = read_excel(readxl_example("datasets.xlsx"), sheet = "iris"),
  x2 = read_excel_check(readxl_example("datasets.xlsx"), sheet = "iris")
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x1           81.3ms   86.1ms     11.3     1.01MB     2.27
#> 2 x2          110.1ms  112.1ms      8.53  227.62KB     0