allisonhorst/palmerpenguins

Include `penguins` and `penguins_raw` in base R datasets package

Opened this issue · 0 comments

The palmerpenguins package is a wonderful resource, offering a better alternative to the iris dataset, and has been widely embraced by the R community in courses, workshops, blog posts, and other learning materials.

In order to make the data even more available (especially for use in teaching and generating examples) it would be great to include penguins and penguins_raw in the datasets package in base R (noting that penguins is already included in Python, Julia and TensorFlow, and that there's a CC0 license on the package).

As well as including the data, we also propose updating all examples in base R that use iris to use penguins instead.

We discussed this at the R Contributors Office Hour this morning. Heather recalled that there had been a call for this previously on twitter (thread). Has there been anything further from that (tagging @gadenbuie, @njtierney)?

Those on the discussion this morning (especially me and @hturner) are happy to push this forward. We thought an issue here was the best place to start, to get the insight of the package authors (tagging @allisonhorst and @apreshill). Do you support this idea? Would you like to be involved? There are a few things that will need thinking about, and we'd appreciate your input:

  • Would the data go in hard-coded or would the data-generating scripts in the package be used?
  • Could the vignettes be included (hasn't been done before for in dataset package)?
  • What happens with all the additional material, e.g. the art? (Presumably keep the webpage and link to it in the datasets documentation?)

How were these questions handled for the inclusion of the data in Python/Julia/TensorFlow?

Once there's a response here, we'll also get a conversation going in the R Contributors Slack and prepare a case for the R Core Team (of which this issue and the links in it will form a part) as the next steps. The R Contributors Office Hours notes linked above also lists further steps needed for making this addition to base R.