pharmaverse/metatools

Multiple supp IDVAR values going to the same QNAM adds multiple IDVAR.x, .y, etc columns

Closed this issue · 1 comments

In the example below, there should be a single "AETERM" column not an "AETERM.x" and "AETERM.y" column.

library(metatools)
library(tidyverse)

simple_ae <-
  safetyData::sdtm_ae |>
  filter(USUBJID %in% c("01-701-1015", "01-701-1023"))
simple_suppae <- safetyData::sdtm_suppae[c(1, 4), ]
simple_suppae$IDVAR[2] <- "AEDTC"
simple_suppae$IDVARVAL[2] <- "2012-09-02"
combine_supp(simple_ae, supp = simple_suppae)
#>        STUDYID DOMAIN     USUBJID AESEQ AESPID
#> 1 CDISCPILOT01     AE 01-701-1015     1    E07
#> 2 CDISCPILOT01     AE 01-701-1015     2    E08
#> 3 CDISCPILOT01     AE 01-701-1015     3    E06
#> 4 CDISCPILOT01     AE 01-701-1023     3    E10
#> 5 CDISCPILOT01     AE 01-701-1023     1    E08
#> 6 CDISCPILOT01     AE 01-701-1023     2    E09
#> 7 CDISCPILOT01     AE 01-701-1023     4    E08
#>                                 AETERM                    AELLT AELLTCD
#> 1            APPLICATION SITE ERYTHEMA APPLICATION SITE REDNESS      NA
#> 2            APPLICATION SITE PRURITUS APPLICATION SITE ITCHING      NA
#> 3                            DIARRHOEA                 DIARRHEA      NA
#> 4 ATRIOVENTRICULAR BLOCK SECOND DEGREE   AV BLOCK SECOND DEGREE      NA
#> 5                             ERYTHEMA                 ERYTHEMA      NA
#> 6                             ERYTHEMA       LOCALIZED ERYTHEMA      NA
#> 7                             ERYTHEMA                 ERYTHEMA      NA
#>                                AEDECOD AEPTCD    AEHLT AEHLTCD    AEHLGT
#> 1            APPLICATION SITE ERYTHEMA     NA HLT_0617      NA HLGT_0152
#> 2            APPLICATION SITE PRURITUS     NA HLT_0317      NA HLGT_0338
#> 3                            DIARRHOEA     NA HLT_0148      NA HLGT_0588
#> 4 ATRIOVENTRICULAR BLOCK SECOND DEGREE     NA HLT_0415      NA HLGT_0086
#> 5                             ERYTHEMA     NA HLT_0284      NA HLGT_0192
#> 6                             ERYTHEMA     NA HLT_0284      NA HLGT_0192
#> 7                             ERYTHEMA     NA HLT_0284      NA HLGT_0192
#>   AEHLGTCD                                             AEBODSYS AEBDSYCD
#> 1       NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS       NA
#> 2       NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS       NA
#> 3       NA                           GASTROINTESTINAL DISORDERS       NA
#> 4       NA                                    CARDIAC DISORDERS       NA
#> 5       NA               SKIN AND SUBCUTANEOUS TISSUE DISORDERS       NA
#> 6       NA               SKIN AND SUBCUTANEOUS TISSUE DISORDERS       NA
#> 7       NA               SKIN AND SUBCUTANEOUS TISSUE DISORDERS       NA
#>                                                  AESOC AESOCCD    AESEV AESER
#> 1 GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS      NA     MILD     N
#> 2 GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS      NA     MILD     N
#> 3                           GASTROINTESTINAL DISORDERS      NA     MILD     N
#> 4                                    CARDIAC DISORDERS      NA     MILD     N
#> 5               SKIN AND SUBCUTANEOUS TISSUE DISORDERS      NA     MILD     N
#> 6               SKIN AND SUBCUTANEOUS TISSUE DISORDERS      NA MODERATE     N
#> 7               SKIN AND SUBCUTANEOUS TISSUE DISORDERS      NA     MILD     N
#>   AEACN    AEREL                      AEOUT AESCAN AESCONG AESDISAB AESDTH
#> 1    NA PROBABLE NOT RECOVERED/NOT RESOLVED      N       N        N      N
#> 2    NA PROBABLE NOT RECOVERED/NOT RESOLVED      N       N        N      N
#> 3    NA   REMOTE         RECOVERED/RESOLVED      N       N        N      N
#> 4    NA POSSIBLE NOT RECOVERED/NOT RESOLVED      N       N        N      N
#> 5    NA POSSIBLE NOT RECOVERED/NOT RESOLVED      N       N        N      N
#> 6    NA PROBABLE NOT RECOVERED/NOT RESOLVED      N       N        N      N
#> 7    NA POSSIBLE         RECOVERED/RESOLVED      N       N        N      N
#>   AESHOSP AESLIFE AESOD      AEDTC    AESTDTC    AEENDTC AESTDY AEENDY
#> 1       N       N     N 2014-01-16 2014-01-03       <NA>      2     NA
#> 2       N       N     N 2014-01-16 2014-01-03       <NA>      2     NA
#> 3       N       N     N 2014-01-16 2014-01-09 2014-01-11      8     10
#> 4       N       N     N 2012-08-27 2012-08-26       <NA>     22     NA
#> 5       N       N     N 2012-08-27 2012-08-07 2012-08-30      3     26
#> 6       N       N     N 2012-08-27 2012-08-07       <NA>      3     NA
#> 7       N       N     N 2012-09-02 2012-08-07 2012-08-30      3     26
#>   AETRTEM.x AETRTEM.y
#> 1      <NA>         Y
#> 2      <NA>      <NA>
#> 3      <NA>      <NA>
#> 4      <NA>      <NA>
#> 5      <NA>      <NA>
#> 6      <NA>      <NA>
#> 7         Y      <NA>

Created on 2024-04-12 with reprex v2.1.0

I'm working on a PR for this now.

As I was working on this PR, I found this test that I don't understand.

I thought that the intent of combine_supp() would have required (or assumed) that the dataset argument to be a valid SDTM dataset. But, the ae dataset used on line 176 here has SUPPVAR1, SUPPVAR2, and SUPPVAR3 columns already. Then combine_supp() renames those to SUPPVAR1.x, etc. and adds new SUPPVAR1.y columns. The test on lines 177 to 179 explicitly use those columns.

ae <- safetyData::sdtm_ae %>%
mutate(
SUPPVAR1 = letters[1:nrow(safetyData::sdtm_ae)],
SUPPVAR2 = rep(letters, 36)[1:nrow(safetyData::sdtm_ae)],
SUPPVAR3 = USUBJID,
IDVAR = as.numeric(str_extract(USUBJID, "\\d{3}$"))
)
### Mock up a metadata necessary to make the SUPP
supp_meta <- tibble::tribble(
~qnam, ~qlabel, ~idvar, ~qeval, ~qorig,
"SUPPVAR1", "Supp Test 1", "AESEQ", "Investigator", "CRF",
"SUPPVAR2", "Supp Test 2", "AESEQ", "Investigator", "CRF",
"SUPPVAR3", "Supp Test 3", "IDVAR", "Investigator", "CRF",
)
### Wrap and map
suppae <- pmap_dfr(supp_meta, build_qnam, dataset=ae) %>%
arrange(USUBJID, QNAM, IDVARVAL)
dataset = ae %>%
select(-starts_with("SUPP"))
supp = suppae
multi_out <- combine_supp(ae, suppae) %>%
dplyr::summarise(v1 = all(all.equal(SUPPVAR1.x, SUPPVAR1.y)), #Because there are NA rows
v2 = all(all.equal(SUPPVAR2.x, SUPPVAR2.y)),
v3 = all(SUPPVAR3.x == SUPPVAR3.y)) %>%
tidyr::pivot_longer(everything()) %>%
pull(value) %>%
all()
expect_equal(multi_out, TRUE)

I would have thought that the preferred behavior would have been:

  1. All columns in QNAM must not be in the original dataset
  2. Generate a list of wide-supp datasets for all QNAM/IDVAR combinations (the current code only uses IDVAR,
    group_by(IDVAR) %>% #For when there are multiple IDs
    )
  3. Merge each of the new wide-supp datasets

And, step 3 above should account for repeated QNAM values in different IDVAR rows. (This is the issue I'm trying to address here.)

I'm going to make the PR change the behavior to add column names that are not identical to the QNAM name because that seems to be more accurate for the SDTM standard, and this will be a breaking change.

I'm happy to chat about it if there is a reason to keep the current behavior.