easystats/datawizard

Select-helper with "." doesn't work as expected?

strengejacke opened this issue · 6 comments

The following code (from readme) is supposed to drop Species, but it doesn't.

library(datawizard)
data(iris)
iris |>
  # all rows where Species is "versicolor" or "virginica"
  data_filter(Species %in% c("versicolor", "virginica")) |>
  # select only columns with "." in names (i.e. drop Species)
  data_select(contains(".")) |> 
  head()
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 51          7.0         3.2          4.7         1.4 versicolor
#> 52          6.4         3.2          4.5         1.5 versicolor
#> 53          6.9         3.1          4.9         1.5 versicolor
#> 54          5.5         2.3          4.0         1.3 versicolor
#> 55          6.5         2.8          4.6         1.5 versicolor
#> 56          5.7         2.8          4.5         1.3 versicolor

Created on 2023-06-13 with reprex v2.0.2

The proper regex is \\. otherwise it selects everything:

library(datawizard)

iris |>
  data_select(contains("\\.")) |> 
  head()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3.0          1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> 5          5.0         3.6          1.4         0.2
#> 6          5.4         3.9          1.7         0.4

I can update the readme

But regex is set to FALSE.

I think the problematic code is here:

.select_helper <- function(expr, data, ignore_case, regex, verbose) {
  lst_expr <- as.list(expr)

  # need this if condition to distinguish between starts_with("Sep") (that we
  # can use directly) and starts_with(i) (where we need to get i)
  if (length(lst_expr) == 2L && typeof(lst_expr[[2]]) == "symbol") {
    collapsed_patterns <- .dynGet(lst_expr[[2]], inherits = FALSE, minframe = 0L)
  } else {
    collapsed_patterns <- paste(unlist(lst_expr[2:length(lst_expr)]), collapse = "|")
  }

  helper <- insight::safe_deparse(lst_expr[[1]])

  rgx <- switch(helper,
    "starts_with" = paste0("^(", collapsed_patterns, ")"),
    "ends_with" = paste0("(", collapsed_patterns, ")$"),
    "contains" = paste0("(", collapsed_patterns, ")"),
    "regex" = collapsed_patterns,
    insight::format_error("There is no select helper called '", helper, "'.")
  )
  grep(rgx, colnames(data), ignore.case = ignore_case)
}

Argument regex is passed, but not evaluated.

If we want a different behaviour, we should somewhere mention this in the code. The code from the readme used to work, i.e. contains(".") selected everything with dots when regex was FALSE.

But I thought regex only applies when select is a character (and that's what the docs say) so whether regex is TRUE or FALSE shouldn't matter in the example above

ok, I think the docs are fine. regex modulates the select-string when we don't have select-helpers, and if we have select-helpers, we only have regular expressions

But I thought regex only applied when select was a character (and that's what the docs say) so whether regex is TRUE or FALSE shouldn't matter in the example above

Yes. It was maybe an old artifact from the readme examples.