pharmaverse/admiralvaccine

Documentation: data source in R/data.R

Closed this issue · 9 comments

kaz462 commented

Please select a category the issue is focused on?

No response

Let us know where something needs a refresh or put your idea here!

Hi admiralvaccine team, could data source be added to data.R? e.g., data.R from admiralophtha, data.R from admiral.test
Thanks! Link pharmaverse/pharmaversesdtm#4

Thanks @kaz462, the data sources have been added to R/data.R. Closing this issue.

kaz462 commented

Thanks @ahasoplakus !
For the data sources, could you please list the original sources/programs instead of the URLs in admiralvaccine, e.g., some data are copies from admiral.test, some are reconstructed in programs like the following one from admiralophtha.
(This information is needed for pharmaversesdtm, so that we can recreate the admiralvaccine data there. Thanks again for working on this!)

#' Best Corrected Visual Acuity Analysis Dataset
#'
#' An example Best Corrected Visual Acuity (BCVA) analysis dataset
#' @keywords datasets
#' @family datasets
#' @source
#' Derived from the `oe` and `ADSL` datasets using `{admiral}`, `{admiralophtha}` and
#' (\url{https://github.com/pharmaverse/admiralophtha/blob/main/inst/templates/ad_adbcva.R})
#'
"admiralophtha_adbcva"

@kaz462 Thank you for the clarification. @arjoon-r @vikrams95 could you please check on your end?

@kaz462 Thanks for bringing up this, we do not have source code for the datasets which we have used except vx_is and vx_suppis. We just mocked the data and used it due to data complexity and time constraint, anyway we will work on adding source code for all the datasets and deploy it on future release. Thanks!!

kaz462 commented

@vikrams95 @ahasoplakus Thanks for checking!
cc: @pharmaverse/admiraldata

We just mocked the data and used it due to data complexity and time constraint

Hi @vikrams95 @ahasoplakus @neetusan.

In any pharmaverse package, transparency and reproducibility are key - we need to be able to see where data/code comes from, and what we are doing to it. This also ensures that nothing proprietary is accidentally open-sourced. Thus, we cannot just mock up the data, and not include the source program for it or documentation explaining its source. I should also add that the absence of source programs also makes editing your data in the future much harder.

As such, please could you make it a priority to locate your source programs so that we can include them in pharmaversesdtm, or alternatively re-create the source programs entirely. Otherwise we will not be able to include these datasets in pharmaversesdtm.

Thanks!

We just mocked the data and used it due to data complexity and time constraint

Hi @vikrams95 @ahasoplakus @neetusan.

In any pharmaverse package, transparency and reproducibility are key - we need to be able to see where data/code comes from, and what we are doing to it. This also ensures that nothing proprietary is accidentally open-sourced. Thus, we cannot just mock up the data, and not include the source program for it or documentation explaining its source. I should also add that the absence of source programs also makes editing your data in the future much harder.

As such, please could you make it a priority to locate your source programs so that we can include them in pharmaversesdtm, or alternatively re-create the source programs entirely. Otherwise we will not be able to include these datasets in pharmaversesdtm.

Thanks!

Thanks @manciniedoardo we will add the source programs for the remaining data files and update R/data.R accordingly.

Hi @kaz462 and @manciniedoardo, The Data sources has been included in R/data.R and also, we have created the source program for each Vaccine specific - SDTM datasets and added them in inst/create_vx_data folder.

Thanks @arjoon-r! The plan now would be to move the programs over to {pharmaversesdtm} (our one source for test SDTM data). However we probably will not have time to include this in the first release of {pharmaversesdtm} because that is scheduled for 4th Sept. Additionally, although in our guidance for test data I am seeing that it's not listed as mandatory to use the CDISC pilot DM dataset as a basis (so as to have consistent USUBJIDs etc) all the other test data does do this so it might be worth implementing this in the vaccine programs as well.