Multiple supp IDVAR values going to the same QNAM adds multiple IDVAR.x, .y, etc columns
Closed this issue · 1 comments
In the example below, there should be a single "AETERM" column not an "AETERM.x" and "AETERM.y" column.
library(metatools)
library(tidyverse)
simple_ae <-
safetyData::sdtm_ae |>
filter(USUBJID %in% c("01-701-1015", "01-701-1023"))
simple_suppae <- safetyData::sdtm_suppae[c(1, 4), ]
simple_suppae$IDVAR[2] <- "AEDTC"
simple_suppae$IDVARVAL[2] <- "2012-09-02"
combine_supp(simple_ae, supp = simple_suppae)
#> STUDYID DOMAIN USUBJID AESEQ AESPID
#> 1 CDISCPILOT01 AE 01-701-1015 1 E07
#> 2 CDISCPILOT01 AE 01-701-1015 2 E08
#> 3 CDISCPILOT01 AE 01-701-1015 3 E06
#> 4 CDISCPILOT01 AE 01-701-1023 3 E10
#> 5 CDISCPILOT01 AE 01-701-1023 1 E08
#> 6 CDISCPILOT01 AE 01-701-1023 2 E09
#> 7 CDISCPILOT01 AE 01-701-1023 4 E08
#> AETERM AELLT AELLTCD
#> 1 APPLICATION SITE ERYTHEMA APPLICATION SITE REDNESS NA
#> 2 APPLICATION SITE PRURITUS APPLICATION SITE ITCHING NA
#> 3 DIARRHOEA DIARRHEA NA
#> 4 ATRIOVENTRICULAR BLOCK SECOND DEGREE AV BLOCK SECOND DEGREE NA
#> 5 ERYTHEMA ERYTHEMA NA
#> 6 ERYTHEMA LOCALIZED ERYTHEMA NA
#> 7 ERYTHEMA ERYTHEMA NA
#> AEDECOD AEPTCD AEHLT AEHLTCD AEHLGT
#> 1 APPLICATION SITE ERYTHEMA NA HLT_0617 NA HLGT_0152
#> 2 APPLICATION SITE PRURITUS NA HLT_0317 NA HLGT_0338
#> 3 DIARRHOEA NA HLT_0148 NA HLGT_0588
#> 4 ATRIOVENTRICULAR BLOCK SECOND DEGREE NA HLT_0415 NA HLGT_0086
#> 5 ERYTHEMA NA HLT_0284 NA HLGT_0192
#> 6 ERYTHEMA NA HLT_0284 NA HLGT_0192
#> 7 ERYTHEMA NA HLT_0284 NA HLGT_0192
#> AEHLGTCD AEBODSYS AEBDSYCD
#> 1 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA
#> 2 NA GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA
#> 3 NA GASTROINTESTINAL DISORDERS NA
#> 4 NA CARDIAC DISORDERS NA
#> 5 NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA
#> 6 NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA
#> 7 NA SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA
#> AESOC AESOCCD AESEV AESER
#> 1 GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA MILD N
#> 2 GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS NA MILD N
#> 3 GASTROINTESTINAL DISORDERS NA MILD N
#> 4 CARDIAC DISORDERS NA MILD N
#> 5 SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA MILD N
#> 6 SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA MODERATE N
#> 7 SKIN AND SUBCUTANEOUS TISSUE DISORDERS NA MILD N
#> AEACN AEREL AEOUT AESCAN AESCONG AESDISAB AESDTH
#> 1 NA PROBABLE NOT RECOVERED/NOT RESOLVED N N N N
#> 2 NA PROBABLE NOT RECOVERED/NOT RESOLVED N N N N
#> 3 NA REMOTE RECOVERED/RESOLVED N N N N
#> 4 NA POSSIBLE NOT RECOVERED/NOT RESOLVED N N N N
#> 5 NA POSSIBLE NOT RECOVERED/NOT RESOLVED N N N N
#> 6 NA PROBABLE NOT RECOVERED/NOT RESOLVED N N N N
#> 7 NA POSSIBLE RECOVERED/RESOLVED N N N N
#> AESHOSP AESLIFE AESOD AEDTC AESTDTC AEENDTC AESTDY AEENDY
#> 1 N N N 2014-01-16 2014-01-03 <NA> 2 NA
#> 2 N N N 2014-01-16 2014-01-03 <NA> 2 NA
#> 3 N N N 2014-01-16 2014-01-09 2014-01-11 8 10
#> 4 N N N 2012-08-27 2012-08-26 <NA> 22 NA
#> 5 N N N 2012-08-27 2012-08-07 2012-08-30 3 26
#> 6 N N N 2012-08-27 2012-08-07 <NA> 3 NA
#> 7 N N N 2012-09-02 2012-08-07 2012-08-30 3 26
#> AETRTEM.x AETRTEM.y
#> 1 <NA> Y
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> 6 <NA> <NA>
#> 7 Y <NA>
Created on 2024-04-12 with reprex v2.1.0
I'm working on a PR for this now.
As I was working on this PR, I found this test that I don't understand.
I thought that the intent of combine_supp()
would have required (or assumed) that the dataset
argument to be a valid SDTM dataset. But, the ae
dataset used on line 176 here has SUPPVAR1
, SUPPVAR2
, and SUPPVAR3
columns already. Then combine_supp()
renames those to SUPPVAR1.x
, etc. and adds new SUPPVAR1.y
columns. The test on lines 177 to 179 explicitly use those columns.
metatools/tests/testthat/test-supp.R
Lines 154 to 183 in d3a5642
I would have thought that the preferred behavior would have been:
- All columns in
QNAM
must not be in the originaldataset
- Generate a list of wide-supp datasets for all
QNAM
/IDVAR
combinations (the current code only usesIDVAR
,Line 165 in d3a5642
- Merge each of the new wide-supp datasets
And, step 3 above should account for repeated QNAM
values in different IDVAR
rows. (This is the issue I'm trying to address here.)
I'm going to make the PR change the behavior to add column names that are not identical to the QNAM name because that seems to be more accurate for the SDTM standard, and this will be a breaking change.
I'm happy to chat about it if there is a reason to keep the current behavior.