pharmaverse/datacutr

Feature Request: More verbose/detailed error message when incorrect datasets are fed into `process_cut`

Opened this issue · 1 comments

Feature Idea

If I call process_cut like so:

cut_data <- process_cut(
  source_sdtm_data = source_data,
  patient_cut_v = c( "ds"),
  date_cut_m = rbind(
    c("ae", "AESTDTC"),
    c("ec", "ECSTDTC"),
    c("vs", "VSDTC")
  ),
  no_cut_v = c("ta", "td", "te", "ti"),
  dataset_cut = dcut,
  cut_var = DCUTDTM,
  special_dm = TRUE
)

but source_data doesn't contain exactly the domains "ae", "ec","vs","ds","ta", "td", "te", "ti" that are specified within my call, I get the following error:

Error: Inconsistency between input SDTMv datasets and the SDTMv datasets
listed under each cut approach. Please check for the two likely issues below... 

1) There are input SDTMv datasets where no cut method has been defined.
2) A cut method has been defined for a SDTMv dataset that does not exist in the
source SDTMv data.

when really I'd want the function to tell me which datasets are mismatching.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

Proposed Solution:

  • Create a new vector which appends together patient_cut_v, no_cut_v and column 1 of date_cut_m. Let's call this cut_inputs for future reference.
  • If special_dm = TRUE, append "dm" to cut_inputs for below checks.
  • Extract the names of each SDTMv dataframe from source_sdtm_data and store as a vector.
  • Loop through each dataframe of source_sdtm_data and check that each dataframe name only appears once. If fails, error message would say "Error: dataframe name appears more than once in source_sdtm_data."
  • Loop through each dataframe of source_sdtm_data and check that each dataframe appears in cut_inputs. If fails, error message would say "Error: dataframe name appears in source_sdtm_data but no cut method has been assigned."
  • Loop through each dataframe of cut_inputs and check that each dataframe name only appears once. If fails, error message would say "Error: Multiple cut types have been assigned for dataframe name."
  • Loop through each dataframe of cut_inputs and check that each dataframe appears in source_sdtm_data. If fails, error message would say "Error: Cut types have been assigned for dataframe name which does not exist in source_sdtm_data."