Select-helper with "." doesn't work as expected?
strengejacke opened this issue · 6 comments
The following code (from readme) is supposed to drop Species
, but it doesn't.
library(datawizard)
data(iris)
iris |>
# all rows where Species is "versicolor" or "virginica"
data_filter(Species %in% c("versicolor", "virginica")) |>
# select only columns with "." in names (i.e. drop Species)
data_select(contains(".")) |>
head()
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 51 7.0 3.2 4.7 1.4 versicolor
#> 52 6.4 3.2 4.5 1.5 versicolor
#> 53 6.9 3.1 4.9 1.5 versicolor
#> 54 5.5 2.3 4.0 1.3 versicolor
#> 55 6.5 2.8 4.6 1.5 versicolor
#> 56 5.7 2.8 4.5 1.3 versicolor
Created on 2023-06-13 with reprex v2.0.2
The proper regex is \\.
otherwise it selects everything:
library(datawizard)
iris |>
data_select(contains("\\.")) |>
head()
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
I can update the readme
But regex
is set to FALSE
.
I think the problematic code is here:
.select_helper <- function(expr, data, ignore_case, regex, verbose) {
lst_expr <- as.list(expr)
# need this if condition to distinguish between starts_with("Sep") (that we
# can use directly) and starts_with(i) (where we need to get i)
if (length(lst_expr) == 2L && typeof(lst_expr[[2]]) == "symbol") {
collapsed_patterns <- .dynGet(lst_expr[[2]], inherits = FALSE, minframe = 0L)
} else {
collapsed_patterns <- paste(unlist(lst_expr[2:length(lst_expr)]), collapse = "|")
}
helper <- insight::safe_deparse(lst_expr[[1]])
rgx <- switch(helper,
"starts_with" = paste0("^(", collapsed_patterns, ")"),
"ends_with" = paste0("(", collapsed_patterns, ")$"),
"contains" = paste0("(", collapsed_patterns, ")"),
"regex" = collapsed_patterns,
insight::format_error("There is no select helper called '", helper, "'.")
)
grep(rgx, colnames(data), ignore.case = ignore_case)
}
Argument regex
is passed, but not evaluated.
If we want a different behaviour, we should somewhere mention this in the code. The code from the readme used to work, i.e. contains(".")
selected everything with dots when regex
was FALSE
.
But I thought regex
only applies when select
is a character (and that's what the docs say) so whether regex
is TRUE
or FALSE
shouldn't matter in the example above
ok, I think the docs are fine. regex
modulates the select-string when we don't have select-helpers, and if we have select-helpers, we only have regular expressions
But I thought
regex
only applied when select was a character (and that's what the docs say) so whetherregex
isTRUE
orFALSE
shouldn't matter in the example above
Yes. It was maybe an old artifact from the readme examples.