allisonhorst/palmerpenguins

query about use of factors in the penguins dataset

Opened this issue · 0 comments

I love the penguins. Thanks for putting this together.

I had one question. Dospecies, island, and sex need to be factors?

As an example of how leaving them as character vectors might be preferable, imagine an analysis that considers penguins on only two island. If dplyr::filter() is used then there's still a stray factor level that needs to be dropped.

Would it be feasible to move to characters in a future release?

suppressPackageStartupMessages(library(dplyr))
library(palmerpenguins)
glimpse(penguins)
#> Rows: 344
#> Columns: 8
#> $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ade…
#> $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgers…
#> $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,…
#> $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,…
#> $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18…
#> $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,…
#> $ sex               <fct> male, female, female, NA, female, male, female, mal…
#> $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 200…

Created on 2020-08-02 by the reprex package (v0.3.0)