openpharma/visR

Improve experience for non-CDISC data

ddsjoberg opened this issue · 1 comments

The visualizations for time to event data in visR are amazing and will be used by many people, including those whose data does not follow CDISC conventions. We can do a couple of things to improve their experience, while not taking away from the experience of those working with CDISC data.

  1. Export a function to convert conventional Surv(time, event) coding to AVAL, CNSR coding. I don't think i'll ever memorize that AVAL is the time column's name 😆 Something small and easy like the example below would work.
  2. When data is used that does not have 'PARAM' and 'PARAMCD' columns, visr() prints a warning (see below) about the x-axis label. We don't need a warning to tell users what was not done. This should be handled similarly to other places in the package regarding labels. For example, if a column has a label, we use it. But if the column doesn't have a label, we use the variable name in visr(): we don't print a warning that the variable label wasn't used. It's already well documented in the the visr() help file, that if the 'PARAM' and 'PARAMCD' are present, their values will be used to construct the x-axis label. Let's remove this warning
library(visR)

as_CDISC_names <- function(data, time, event) {
  time <- dplyr::select(data, {{ time }}) %>% names()
  event <- dplyr::select(data, {{ event }}) %>% names()
  
  Surv <- survival::Surv(time = data[[time]], event = data[[event]]) %>% unclass()
  
  # convert to indicator of censoring event
  data[[event]] <- 1 - Surv[, 2]
  
  # rename columns to be in CDISC format
  data %>%
    dplyr::rename(AVAL = !!time, CNSR = !!event)
}

survival::lung %>%
  as_CDISC_names(time, status) %>%
  estimate_KM(strata = "sex") %>%
  visr()
#> Warning in visr.survfit(.): The x-axis label was not specified and could also
#> not be automatically determined due to absence of 'PARAM' and 'PARAMCD'.

Created on 2022-05-10 by the reprex package (v2.0.1)

I think it's possible that a better implementation of the first point is possible. I think many users would like a familiar format of data and formula. Maybe we can implement a formula method for estimate_KM().

estimate_KM <- function(x, ...) {
  UseMethod("estimate_KM")
}

estimate_KM.data <- function(data, AVAL, CNSR, strata, ....) {
  
}

estimate_KM.formula <- function(formula, data) {
  # parse the formula and pass elements to `estimate_KM.data()`
}

this should be two issues. i'll resubmit