pharmaverse/metatools

Make create_cat_var & create_subgrps more general

Opened this issue · 15 comments

Currently create_cat_var calls create_subgrps to assign groups based on age ranges. My suggestion is to change the naming of the functions to more accurately represent what they do. Perhaps just create_age_grps or something like that. I found create_cat_var to be confusing in that it really only supports creating categories based on age ranges. Another fix could be expanding the code/decode options to account for the filtering itself.

Hello,
This function will work for any grouping, so it works with age, bmi or any time you are turning a numeric variable into a categorical based on the value of the code lists. You can see in the example for the create_subgrps function, I use numbers from 1 to 10 which aren't typically ages, but might be something like number of exacerbation. The function is pretty simple, it just needs a numeric variable to split into categories and a code list which labels those categories. Does that make sense? If so, I would appreciate your feedback on how to improve the documentation

Ah, interesting. I hadn't thought to try using any other numbers. Apologies for my confusion. I think in general having more examples in the documentation would be helpful.

Also, this may not be the right thread but how do you calculate BMI using metacore/metatools from Height and Weight? I'm just wondering what the correct way is to define derivations that's not simply grabbing a predecessor variable or mapping numeric ranges to categories.

The broader question I'm struggling with is whether I can use metacore/metatools to fully handle creating ADaMs from SDTMs. Additionally, admiral appears to be leading the way towards being the industry standard for deriving variables; how can metacore be combined with admiral? In addition to the code_decode column in codelist, could we have a code option which then triggers some custom code, perhaps using admiral, to be run to derive a variable? Not sure if this is doable currently or otherwise not advised.

Fair question, {metatools} isn't built to be a stand alone tool, it is meant to be used in conjunction with things like admiral. The problem I ran into when looking to automate stuff is there isn't enough consistency in how things are defined across companies to be able to build automation off the back of them. That is why I just stick to the predecessors. But, that doesn't mean you couldn't develop a package for your company that reflect how your company defines things.

Gotcha. I think I just found a project to carve out :) hopefully I can open source / turn the learnings into a talk next year.

I guess there's a spectrum of options to define custom derivations. The (seemingly) most flexible way is to allow for R code to be defined in the codelist. For example, you could have a BMI variable which is defined as mutate(dm, BMI = WEIGHT/HEIGHT) or, leveraging admiral, you could derive a variable as derive_vars_merged(...).

One potential issue I foresee is updating the base metacore class so the codelist table can handle a format, say, other than a code/decode tibble in the code column. I'm just not sure if this is allowed currently or if it's better to create a subclass that extends the attributes of codelist.

You can define R code in the derivation table. You would just need to define it in such away that you could programmatically identify it as R code. This shouldn't be done in the code list but instead leveraging the derivations, which is how the predecessors are handled

Do you have an example of this? I'm exploring some of the sample files in the inst/extdata directory of {metacore} but I haven't come across an example as you describe. When looking at the derivations table, the derivation column appears to mostly be descriptions and sometimes looks like a direct mapping of a predecessor variable. Is this where R code should go?

I don't have an example of it, but it is a free text field so you can put what you want into the derivation column of the derivation table

Gotcha, thanks! So I went through an example of how it could be done. Here goes:

# Load dependencies
library(admiral)
library(admiral.test)
library(metacore)
library(metatools)
library(dplyr)
library(stringr)

# Load data
data("admiral_dm")

# Rename data. Needs to match code in derivation
dm <- admiral_dm

# Load metadata 
load(metacore_example("pilot_ADaM.rda"))

# Subset so we're only working with ADSL metadata
metacore <- metacore %>% select_dataset("ADSL")

# Create a calculated derivation (dummy one)
derivations <- metacore$derivations %>% 
mutate(derivation = ifelse(derivation_id == "MT.ADSL.AGE", 
"=dm %>% .$AGE*10", 
derivation))

# Re-create metacore object because the derivations attribute is read-only
meta_object <- metacore(metacore$ds_spec, 
metacore$ds_vars, 
metacore$var_spec, 
metacore$value_spec, 
derivations, 
metacore$code_list)

# Function to create calculated variable from metadata
create_calc_var <- function(data, metacore) {
  
  # Take derivations starting with an equal sign
  vars_to_calc <- metacore$derivations %>% 
filter(str_detect(derivation, "^="))

  # Iterate through each calculated column
  for (i in 1:nrow(vars_to_calc)) {
    
    # Get name from derivation_id, can't currently get it from derivation column like predecessors
    calc_col_name <- vars_to_calc[,i] %>% mutate(col_name = str_extract(derivation_id, "([^.]+$)")) %>% .$col_name
    
    # Remove equal sign from derivation column so it can be evaluated
    calc_col_expr <- vars_to_calc[i,]$derivation %>% str_remove("^=")
    
    # Calculate the derived column
    data[,calc_col_name] <- eval(parse(text = calc_col_expr))
  }
  
  data
  
}

# Initialize ADSL with predecessor variables
adsl_preds <- build_from_derived(metacore = meta_object, 
                                 ds_list = list("dm" = dm), 
                                 predecessor_only = FALSE, keep = TRUE)

# Add new calculated column with using derivation from metadata
adsl_preds %>% create_calc_var(metacore = meta_object)

# Now let's do a more complex derivation with exposure data using admiral
data("admiral_ex")
ex <- admiral_ex

# Define the derivation
complicated_code_derivation <- '= {
ex_ext <- ex %>%
  derive_vars_dtm(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dtm(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN",
    time_imputation = "last"
  )

adsl_preds %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & nchar(EXSTDTC) >= 10,
    new_vars = exprs(TRTSDTM = EXSTDTM),
    order = exprs(EXSTDTM, EXSEQ),
    mode = "first",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>% 
  .$TRTSDTM
}'

# Add to derivation table
derivations <- metacore$derivations %>% add_row(derivation_id = "MT.ADSL.TRTSDTM", derivation = complicated_code_derivation)

# Re-create metacore object with updated derivation
meta_object <- metacore(metacore$ds_spec, metacore$ds_vars, metacore$var_spec, metacore$value_spec, derivations, metacore$code_list)

# Create the complex derivation
adsl_preds %>% create_calc_var(metacore = meta_object)

@daniel-woodie as FYI we now have an examples site showing some implementation examples of using the various packages together: https://pharmaverse.github.io/examples/adam/adsl.html. Look forward to hearing more on how you get on with the metadata-driven automation approach for ADaM using these packages. Maybe you might like to share a demo at one of our quarterly admiral community meetings sometime? Tagging @manciniedoardo who organises these.

Thanks for adding me in the loop @rossfarrugia, @daniel-woodie please feel free to reach out if you want a slot! Next one is currently planned for 9th May.

@manciniedoardo @rossfarrugia I'm in. We've worked some with Appsilon on a package extension to handle more complex derivations with {admiral}. We'll work to open source that in the next few weeks and can do a walk through at the next admiral community meeting. Interested in hearing feedback (thanks @kaz462 for connecting the dots here).

@daniel-woodie please do keep us in the loop. Look forward to hearing more and once open source we can discuss whether this extension should be put forward to pharmaverse too!

@manciniedoardo @rossfarrugia I'm in. We've worked some with Appsilon on a package extension to handle more complex derivations with {admiral}. We'll work to open source that in the next few weeks and can do a walk through at the next admiral community meeting. Interested in hearing feedback (thanks @kaz462 for connecting the dots here).

Great! I'll be in touch 😄

@daniel-woodie I've sent you a message on Slack to connect re the community meeting :)

Hi @daniel-woodie - just checking you saw my slack message? Would love to feature you in our community meeting!