r-lib/tidyselect

read_csv(col_select = dplyr::select() variable) fails at variable declaration time

Closed this issue · 5 comments

What I expected to work:

columnsOfInterest = c("Year", "ID1", starts_with("Tmin"))
bind_rows(read_csv("file1.csv", col_select = columnsOfInterest),
          read_csv("file2.csv", col_select = columnsOfInterest))

What actually happened:

Error:
! `starts_with()` must be used within a *selecting* function.See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Error:
! `starts_with()` must be used within a *selecting* function.See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
---
Backtrace:
 1. tidyselect::starts_with("Tmin")
 3. tidyselect::peek_vars(fn = "starts_with")
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error:
! `starts_with()` must be used within a *selecting* function.See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
---
Backtrace:1. └─tidyselect::starts_with("Tmin")
 2.   ├─vars %||% peek_vars(fn = "starts_with")
 3.   └─tidyselect::peek_vars(fn = "starts_with")
 4.     └─rlang::abort(msg, call = NULL)

What the best workaround appears to be:

bind_rows(read_csv("file1.csv", col_select = c("Year", "ID1", starts_with("Tmin")),
          read_csv("file2.csv", col_select = c("Year", "ID1", starts_with("Tmin")))

It seems strange to force users loading a bunch of similar .csv files to repeatedly indicate col_select rather than declaring it once. As with many such things it's not a big deal once you figure out what to do, though I did find the linked FAQ and other readr and dplyr documentation to have no relevant content.

It'd be nice to be able to write more proper code instead of relying on find/replace to maintain many hardcoded instances of the same thing. This is, as usual, a trivialized repex—the actual c() statement is considerably more complex—and I certainly wasn't expecting which line of code a c() was declared in would influence whether the declaration was valid.

I'm putting this issue in tidyselect since it seems like this might be worth a look from the context of dplyr::select() mini-language design it appears this is the repo which covers that. From the backtrace it seems peek_vars() might be prematurely positioned.

From the backtrace it seems peek_vars() might be prematurely positioned.

This is true, but because you've called it too soon. As mentioned in the error message, selection helpers must be called in selection context. One way would be to make a selection helper, in this case:

columnsOfInterest <- function() {
  c(
    all_of(c("Year", "ID1")),
    starts_with("Tmin")
  )
}

bind_rows(
  read_csv("file1.csv", col_select = columnsOfInterest()),
  read_csv("file2.csv", col_select = columnsOfInterest())
)

This makes colsOfInterest() a selection helper that you then call in selection context.

However this will first need a fix in all_of() to return a vector of positions, like starts_with() does, otherwise this will be combining strings and numbers. Alternatively, we could also support lists, and then this would be:

columnsOfInterest <- function() {
  list(
    all_of(c("Year", "ID1")),
    starts_with("Tmin")
  )
}

This is true, but because you've called it too soon.

The mini-language failing to compose with a routine readr coding pattern really isn't something an application programmer can be held at fault for. It seems there's a tidyverse design hole here and, from a black box perspective, the most obvious structural fix appears to be for tidyselect to stop assuming construction means use.

As mentioned in the error message, selection helpers must be called in selection context.

The error message and linked help make no effort to explain what a selection context is (a search gives no hits, too) or why tidyselect conflates construction and use. If the design intent was to require lazy construction it's curious the documentation doesn't seem to mention that at all.

There is no need to be unpleasant here.

It seems there's a tidyverse design hole here and, from a black box perspective, the most obvious structural fix appears to be for tidyselect to stop assuming construction means use.

This is only an appearance. There is no way for tidyselect to work the way that you want because it's a contradiction in semantics.

I'm keeping this issue open to look into making it easier to create selection helpers that compose multiple selections (cf my previous comment) and I'll take another look at this error message.

There is no need to be unpleasant here.

No, there isn't. Please stop.

I think we could just add a little more documentation to faq-selection-context, to cover the case where you're extracting out repeated code.