reconverse/incidence2

version2 - vendor the outbreaks data we keep using in examples

Closed this issue · 5 comments

version2 - vendor the outbreaks data we keep using in examples

@thibautjombart - The only data we use in incidence2 from outbreaks is ebola_sim_clean$linelist. I'm thinking of vendoring this (i.e. including the data within incidence2) in the next release as this will ensure all examples run (in a useful way) without the presence of outbreaks. The plan would be to still explicitly call outbreaks in the vignette, just not the example.

How would you like the data set attributed? To the outbreaks package or should I also include the attribution to yourself, Pierre and the additional references as in help("ebola_sim_clean", "outbreaks")?

To make sure I understand: is the issue that some examples are currently not run during checks because outbreaks may not be installed? Or that examples may not run when users try to run them?

I liked the idea of a centralized repos for datasets used in other packages' examples, as it avoid some potential issues:

  • maintenance: maintaining several separate copies of the data can be extra work, although probably low in practice as these things usually don't change over time
  • credit: it is annoying to copy-paste the original Rd, but I can imagine credit cascade going wrong (here, arrows are 'this comes from'): A <- B <- C ; users in C credit package B, which would be wrong if we applied CC-BY concepts to the data distributed in A

If this is solely for testing purposes, maybe one option would be to have a local, non-exported copy of the dataset just used in tests?

is the issue that some examples are currently not run during checks because outbreaks may not be installed? Or that examples may not run when users try to run them?

The latter. However as it's just a data package and not actually going to change it actually just makes more sense to import (rather than suggests) outbreaks and provide the dataset directly through a rexport. Will go that route I think.

Got you, thanks. I think Import may be the easiest approach indeed. Thanks a lot for working on this!

Hmm - Can't actually reexport as nothing is exported. Am leaving as is for now but AFAIK possible two solutions would be:

  • Import the whole package and then load data directly. This would generate an unused imports warning which I'd need to explain to CRAN?
  • As above but have a token use (e.g. variable assignment) of outbreaks somewhere within the package.