tidyverse/forcats

fct_lump() adds an extra argument

PursuitOfDataScience opened this issue · 3 comments

fct_lump() should be added an extra argument to remove Other. Currently, some filtering step is needed to remove it after lumping together. It would be more convenient for the future versions of the forcats package to add such a feature. Not a big deal, but just a suggestion as I need to do this most of the time. Thanks!

Remove it and replace it with what?

Remove it and replace it with what?

I’m not the OP, but replace it with NA, perhaps? In theory, other = NA would work, but this generates an explicit NA level, which isn’t handled ‘correctly’ by is.na():

library(forcats)
x_with_na = factor(c("a", "a", NA))
x_with_other_na = fct_lump_min(c("a", "a", "c"), min = 2, other = NA)

x_with_na
#> [1] a    a    <NA>
#> Levels: a
x_with_other_na
#> [1] a    a    <NA>
#> Levels: a <NA>

is.na(x_with_na)
#> [1] FALSE FALSE  TRUE
is.na(x_with_other_na)
#> [1] FALSE FALSE FALSE

It's not clear what is needed/wanted here, so I'm going to close the issue.