pharmaverse/datacutr

Consider using /data-raw or internal functions for generating example data

Closed this issue · 2 comments

dgkf commented

First - I want to commend you all for having truly phenomenal example data. Minimal data which highlights specific scenarios.

There are a couple commonly-used techniques for providing example data that might be more familiar for your users, or help to improve the quality of the package.

Using /data-raw

You can provide your data-generating R script in the /data-raw directory to automatically build and publish datasets with your package. More details in the "Data" chapter from R Packages.

The data is small, so saving a version of the data that tags along with the package doesn't introduce a lot of overhead, and you avoid having to steer users to your custom-built functions (as suggested below). I would recommend this approach.

Example data-generating functions

If you don't want to ship data files, an alternative is to include functions for generating data within your package. You could imagine a function like:

datacutr::example_data("dm")

That exposes these data instead of having to source an installed script.

Given the size of the datafiles, discussed and agreed to use /data-raw approach - thanks @dgkf

Note that the test data for DM also needs to be updated so that any missing values are left as missing, rather than being imputed with NA values.