Unclear warnings and errors generated when setting levels for a factor generated from a character vector
wtimmerman-fitp opened this issue · 3 comments
When I use fct_relevel with the levels argument, I receive a warning that does not clearly indicate what is going wrong. Similarly, when I use the levels argument in forcats::as_factor()'s, (on the assumptions that arguments in .../ellipsis will be passed on to methods), I receive an error "Arguments in ...
must be used". Both of these are unexpected results for me based on my understanding of the function help text and base::factor().
For background, my intention is to convert a character column into a factor column using a pre-specified list of levels (the pre-specified list is somewhat important as a check and consistency for reasons that I won't get into here). I have reviewed the forcats issues and don't see an exact match for this problem:
- Using base:factor(), I can pass the vector of levels to the levels argument; this is fine, but it is not noisy enough if the levels provided do not match the character column I am mutating into a factor.
- Using forcats::as_factor(), when I pass the levels argument I receive the error "Arguments in
...
must be used." I am not clear if I am misusing the function. - Using forcats::fct_relevel(), I receive the warning "Outer names are only allowed for unnamed scalar atomic inputs". This comes from vctrs, and I also see it referenced in the fct_relevel() help, but it doesn't seem to apply in the reprex I've generated below.
My questions are:
- Should these forcats() functions be generating different/more-specific warnings?
- Should these forcats() functions behave differently when passed the levels argument?
- Should I be using these functions differently (or a different function altogether) given my use case?
Reprex
library(tidyverse)
mtcars2 <-
mtcars %>%
tibble::rownames_to_column(var = "make_model") %>%
dplyr::filter(
dplyr::row_number() <= 5
)
use_levels <-
mtcars2 %>%
dplyr::pull(make_model)
# this works as expected, since the provided levels will by definition match the values in the make_model column.
mtcars2_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = base::factor(
make_model,
levels = use_levels
)
)
# I don't understand why this is an error based on the as_factor() help.
mtcars2_as_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = forcats::as_factor(
make_model,
levels = use_levels
)
)
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `make_model = forcats::as_factor(make_model,
#> levels = use_levels)`.
#> Caused by error:
#> ! Arguments in `...` must be used.
#> x Problematic argument:
#> * levels = use_levels
# I don't understand why this generates this warning since use_levels does not have names
mtcars2_fct_relevel <-
mtcars2 %>%
dplyr::mutate(
make_model = forcats::fct_relevel(
make_model,
levels = use_levels
)
)
#> Warning: Outer names are only allowed for unnamed scalar atomic inputs
# when i modify use_levels to have a value not present in the column, more challenges arise.
use_levels_mod <-
c(use_levels, "Other Car")
# base::factor is not noisy enough that there are factor levels not present in the data.
mtcars2_mod_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = base::factor(
make_model,
levels = use_levels_mod
)
)
# as_factor continus to error
mtcars2_mod_as_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = forcats::as_factor(
make_model,
levels = use_levels_mod
)
)
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `make_model = forcats::as_factor(make_model,
#> levels = use_levels_mod)`.
#> Caused by error:
#> ! Arguments in `...` must be used.
#> x Problematic argument:
#> * levels = use_levels_mod
# fct_relevel generates an expected warning, but still has the
# original warning that makes little sense in this case.
mtcars2_mod_fct_relevel <-
mtcars2 %>%
dplyr::mutate(
make_model = forcats::fct_relevel(
make_model,
levels = use_levels_mod
)
)
#> Warning: Outer names are only allowed for unnamed scalar atomic inputs
#> Warning: Unknown levels in `f`: Other Car
Created on 2022-08-09 by the reprex package (v2.0.1)
Session info
sessionInfo()
#> R version 4.0.5 (2021-03-31)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
#> [5] readr_2.1.2 tidyr_1.2.0 tibble_3.1.8 ggplot2_3.3.6
#> [9] tidyverse_1.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.1.2 xfun_0.31 haven_2.5.0
#> [4] gargle_1.2.0 colorspace_2.0-3 vctrs_0.4.1
#> [7] generics_0.1.3 htmltools_0.5.3 yaml_2.3.5
#> [10] utf8_1.2.2 rlang_1.0.4 pillar_1.8.0
#> [13] glue_1.6.2 withr_2.5.0 DBI_1.1.3
#> [16] dbplyr_2.2.1 readxl_1.4.0 modelr_0.1.8
#> [19] lifecycle_1.0.1 munsell_0.5.0 gtable_0.3.0
#> [22] cellranger_1.1.0 rvest_1.0.2 evaluate_0.15
#> [25] knitr_1.39 tzdb_0.3.0 fastmap_1.1.0
#> [28] fansi_1.0.3 highr_0.9 broom_1.0.0
#> [31] backports_1.4.1 scales_1.2.0 googlesheets4_1.0.0
#> [34] jsonlite_1.8.0 fs_1.5.2 hms_1.1.1
#> [37] digest_0.6.29 stringi_1.7.8 grid_4.0.5
#> [40] cli_3.3.0 tools_4.0.5 magrittr_2.0.3
#> [43] crayon_1.5.1 pkgconfig_2.0.3 ellipsis_0.3.2
#> [46] xml2_1.3.3 reprex_2.0.1 googledrive_2.0.0
#> [49] lubridate_1.8.0 assertthat_0.2.1 rmarkdown_2.14
#> [52] httr_1.4.3 rstudioapi_0.13 R6_2.5.1
#> [55] compiler_4.0.5
Oh, this is perfect! Thank you for the pointer! I think this will solve my issue. level named argument is there, no errors or warnings if an additional level is listed but not in data, errors (unlike base::factor) if one of the supplied levels is not in the data.
I'll close the issue and look forward to fct() getting into a future release.
(example below if anyone curious).
#setup ----
library(tidyverse)
fct <- function(x = character(), levels = NULL, na = character()) {
if (!is.character(x)) {
cli::cli_abort("{.arg x} must be a character vector")
}
if (!is.character(na)) {
cli::cli_abort("{.arg na} must be a character vector")
}
x[x %in% na] <- NA
if (is.null(levels)) {
levels <- unique(x)
} else if (!is.character(levels)) {
abort("`{.arg levels} must be a character vector")
}
invalid <- setdiff(x, c(levels, NA))
if (length(invalid) > 0 ) {
cli::cli_abort(c(
"Values of {.arg x} must be members of {.arg levels}",
i = "Invalid value{?s}: {.str {invalid}}"
))
}
factor(x, levels = levels, exclude = NULL)
}
mtcars2 <-
mtcars %>%
tibble::rownames_to_column(var = "make_model") %>%
dplyr::filter(
dplyr::row_number() <= 5
)
# Match levels----
match_levels <-
mtcars2 %>%
dplyr::pull(make_model)
mtcars2_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = base::factor(
make_model,
levels = match_levels
)
)
mtcars2_fct <-
mtcars2 %>%
dplyr::mutate(
make_model = fct(
make_model,
levels = match_levels
)
)
# Add Levels ----
add_levels <-
c(match_levels, "Other Car")
mtcars2_add_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = base::factor(
make_model,
levels = add_levels
)
)
mtcars2_add_fct <-
mtcars2 %>%
dplyr::mutate(
make_model = fct(
make_model,
levels = add_levels
)
)
levels(mtcars2_add_fct$make_model)
#> [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
#> [4] "Hornet 4 Drive" "Hornet Sportabout" "Other Car"
# Miss Levels ----
miss_levels <-
match_levels[-1]
mtcars2_miss_factor <-
mtcars2 %>%
dplyr::mutate(
make_model = base::factor(
make_model,
levels = miss_levels
)
)
mtcars2_miss_fct <-
mtcars2 %>%
dplyr::mutate(
make_model = fct(
make_model,
levels = miss_levels
)
)
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `make_model = fct(make_model, levels =
#> miss_levels)`.
#> Caused by error in `fct()`:
#> ! Values of `x` must be members of `levels`
#> i Invalid value: "Mazda RX4"
Created on 2022-08-09 by the reprex package (v2.0.1)
Also, if anyone runs into the same warning I got with fct_relevel (Warning: Outer names are only allowed for unnamed scalar atomic inputs), it's because you can't use the levels argument for that function; just pass the vector object of level names (in this case, use_levels) into the ellipsis on its own like:
mtcars2_fct_relevel <-
mtcars2 %>%
dplyr::mutate(
make_model = forcats::fct_relevel(
make_model,
use_levels
)
)