UUPharmacometrics/xpose

Feature Request: Broom compatibility?

Closed this issue · 9 comments

Hi @guiastrennec ,

What do you think about adding broom compatibility?

nlmixr recently added it and it could allow other tidyverse modeling packages to interact with xpose modeling object (like with NONMEM).

I thought you might be interested, or least should be made aware. For me, it was a bit difficult to find out where the parameters were stored initially and this should make it easy to access the parameter estimates (at least)

I have not yet considered this. What do you suggest comes out of tidy(), glance() and augment()?

You get 3rd party package integration.

It will be included in nlmixr; here are some examples of what you can do; Who knows what else would be there in the future:

https://nlmixrdevelopment.github.io/nlmixr/articles/broom.html#dotwhisker

I'm not saying you should, but perhaps you can consider it.

I'd love to have broom (or more accurately broom.mixed) support. tidy() and glance() have rather well-defined semantics (I'd have to double-check what would be expected). augment() would be harder without an engine to run the models (easy for nlmixr, hard for NONMEM).

Note nlmixr does not support augment either. The broom support is not unified or standardized for augment. The preds/ipred/cwres are already in the nlmixr object. Simulations are handled by RxODE. I have a hard time figuring out what augment should do in addition to these already covered items.

Also broom related functions don't actually import broom or broom.mixed but rather generics

As mentioned in #164 I am not in favor of adding broom compatibility nor R model objects features to xpose databases. I believe that these are nice ideas but they would belong to extension packages around xpose. In my opinion one would run into limitation caused by the fact that xpose has no power of running or updating a NONMEM model and several functions commonly used in this context such as predict() could not be implemented.

Can I nudge a reconsideration of the roadmap that model-results-extraction functions (like coef(), vcov(), tidy(), and glance()) be included?

The functions that are focused on extracting information from model objects seem like they would be a direct fit for some parts of xpose. Not the graphics parts, admittedly, but the fact that they would expose the model results in ways that would help interoperability.

As one example in my workflow, I often will make a model in NONMEM then run simulations in mrgsolve (https://github.com/metrumresearchgroup/mrgsolve). For that, I need to access the model estimates including fixed effects, inter-individual variability, and residual variability. I also model in other tools (e.g. nlme() and rstan()) and I sometimes summarize with emmeans(), and having a unified interface to all model results objects helps with interactions with these other tools.

Having these functions in a separate package means that over time due to changes in xpose they will become incompatible with xpose. With them integrated into xpose, they will ensure that tests can be run as part of standard xpose and that issues will not arise with releases of xpose breaking the addon-package.

I know that this is not of interest for package inclusion, but I needed them today, so I wrote the tidy function. Here it is in case someone else needs it (also including the coef and vcov methods from #141):

#' Tidy a xpose_data object
#' 
#' Tidy summarizes information about the components of a model.  For
#' \code{xpose_data} objects, it returns one row per estimated term and columns
#' describing those terms.
#' 
#' @param x An \code{xpose_data} object created by \code{xpose_data()}.
#' @param ... Passed to \code{xpose::get_prm()}
#' @param transform Passed to \code{xpose::get_prm()}
#' @param null_values Values which when in the estimate column should be
#'   considered "not estimated" and would be removed.
#' @param remove_zero_fixed If a value is both fixed and set to zero, should it
#'   be removed from the description?
#' @param conf_level What confidence level should be returned?  (NA indicates no
#'   confidence level returned; otherwise a number between 0 and 1, exclusive.)
#' @return A tibble with the following columns:
#' 
#' \itemize{
#'   \item{effect}{What type of estimate is the parameter? ("theta", "omega", or "sigma")}
#'   \item{term}{The name of the model term (see details)}
#'   \item{fixed}{Is the term fixed (\code{TRUE}) or estimated (\code{FALSE})}
#'   \item{estimate}{The estimated (or fixed) value of the term}
#'   \item{std_error}{The standard error of the term (\code{NA} if fixed or if standard error is not estimated)}
#'   \item{rel_std_error}{The relative standard error of the term (\code{NA} if std.error is \code{NA})}
#'   \item{conf_level}{The confidence level for \code{conf_low} and \code{conf_high}.  This column is only present if \code{conf_level} is not \code{NA}.}
#'   \item{conf_low, conf_high}{The lower and upper bounds of the confidence interval.  These columns are only present if \code{conf_level} is not \code{NA}.}
#' }
#' @export
#' @importFrom dplyr recode select
tidy.xpose_data <- function(x, ..., transform=TRUE, null_values=c(-999, NA), remove_zero_fixed=TRUE, remove_sigma_one_fixed=TRUE, conf_level=NA_real_) {
  prm_tibble <- xpose::get_prm(xpdb=x, ..., transform=transform)
  diag_fun <- ifelse(transform, "sd", "var")
  off_diag_fun <- ifelse(transform, "corr", "covar")
  # Simplify reading of effect types
  prm_tibble$effect <-
    dplyr::recode(
      prm_tibble$type,
      "the"="theta", "ome"="omega", "sig"="sigma",
      .default=NA_character_
    )
  if (any(is.na(prm_tibble$effect))) {
    stop("Error mapping `effect` values")
  }
  
  term_theta <-
    ifelse(
      prm_tibble$effect == "theta" & !(prm_tibble$label %in% ""),
      # Named in the model
      prm_tibble$label,
      ifelse(
        prm_tibble$effect == "theta",
        # Use THETAX if not named in the model
        prm_tibble$name,
        NA_character_
      )
    )
  is_omega <- prm_tibble$effect == "omega"
  is_omega_diag <- is_omega & prm_tibble$m == prm_tibble$n
  is_omega_off_diag <- is_omega & !is_omega_diag
  term_omega_diag <-
    ifelse(
      is_omega_diag & !(prm_tibble$label %in% ""),
      prm_tibble$label,
      prm_tibble$name
    )[is_omega_diag]
  term_omega <-
    ifelse(
      is_omega_diag & !(prm_tibble$label %in% ""),
      term_omega_diag[prm_tibble$m],
      ifelse(
        is_omega_off_diag & !(prm_tibble$label %in% ""),
        prm_tibble$label,
        ifelse(
          is_omega_off_diag,
          sprintf("%s(%s, %s)", off_diag_fun, term_omega_diag[prm_tibble$m], term_omega_diag[prm_tibble$n]),
          NA_character_
        )
      )
    )
  is_sigma <- prm_tibble$effect == "sigma"
  term_sigma <-
    ifelse(
      is_sigma & !(prm_tibble$label %in% ""),
      prm_tibble$label,
      ifelse(
        is_sigma,
        sprintf("%s(%s)", diag_fun, prm_tibble$name),
        NA_character_
      )
    )
  prm_tibble$term <-
    ifelse(
      !is.na(term_theta),
      term_theta,
      ifelse(
        !is.na(term_omega),
        term_omega,
        ifelse(
          !is.na(term_sigma),
          term_sigma,
          NA_character_
        )
      )
    )
  if (any(is.na(prm_tibble$term))) {
    stop("Error mapping `term` values, NAs present")
  } else if (any(prm_tibble$term == "")) {
    stop("Error mapping `term` values, empty strings present")
  }
  
  ret_prep <-
    dplyr::select(
      prm_tibble,
      "effect",
      "term",
      "fixed",
      "estimate"="value",
      "std_error"="se",
      "rel_std_error"="rse"
    )
  if (!is.na(conf_level)) {
    ci <- confint(x, level=conf_level, parm=prm_tibble$name)
    ret_prep$conf_level <- conf_level
    ret_prep$conf_low <- ci[,1]
    ret_prep$conf_high <- ci[,2]
  }
  no_null <- ret_prep[!(ret_prep$estimate %in% null_values), ]
  no_zero_fixed <- no_null[!(remove_zero_fixed & no_null$fixed & no_null$estimate == 0), ]
  no_sigma_1_fixed <-
    no_zero_fixed[
      !(
        remove_sigma_one_fixed &
          no_zero_fixed$effect == "sigma" &
          no_zero_fixed$fixed &
          no_zero_fixed$estimate == 1
      ),
    ]
  # Return the final, filtered values
  no_sigma_1_fixed
}
coef.xpose_data <- function(object, ..., transform=FALSE) {
  ret <- xpose::get_prm(xpdb=object, ..., transform=FALSE)
  setNames(ret$value, ret$name)
}

vcov.xpose_data <- function(object, ...) {
  cov_data <- object$files$data[[which(object$files$extension == "cov")]]
  ret <- as.matrix(cov_data[, setdiff(names(cov_data), "NAME")])
  rownames(ret) <- cov_data$NAME
  ret
}