tidyverse/forcats

Make forcats 1.0.0 Tidyverse blog article more instructive

turbanisch opened this issue · 1 comments

Not sure if this is worth opening an issue - I just went through the forcats 1.0.0 tidyverse blog article and found it mildly confusing at first glance. The introduction of fct_na_value_to_level() to the plot doesn't do anything; the plot printed below is exactly the same.

We can make fct_infreq() do what we want by moving the NA from the values to the levels:

ggplot(starwars, aes(y = fct_rev(fct_infreq(fct_na_value_to_level(hair_color))))) + 
  geom_bar() + 
  labs(y = "Hair color")

I would find it more instructive to rename the NA level in the same step to show that ggplot will then properly adjust:

fct_na_value_to_level(hair_color, "missing")

Otherwise it is easy to miss this bit because it appears only in the context of lumping factor levels together further down below:

starwars |> 
  mutate(
    hair_color = hair_color |> 
      fct_na_value_to_level("(Unknown)") |> 
      fct_infreq() |> 
      fct_lump_min(2, other_level = "(Other)") |> 
      fct_rev() 
  ) |> 
  ggplot(aes(y = hair_color)) + 
  geom_bar() + 
  labs(y = "Hair color")

I found the plots confusing as well - basically these 2 are the same no ?

ggplot(starwars, aes(y = fct_rev(fct_infreq(fct_na_value_to_level(hair_color))))) + 
  geom_bar() + 
  labs(y = "Hair color")

and this

starwars |> 
  mutate(
    hair_color = hair_color |> 
      fct_na_value_to_level() |> 
      fct_infreq() |> 
      fct_rev()
  ) |> 
  ggplot(aes(y = hair_color)) + 
  geom_bar() + 
  labs(y = "Hair color")