Updating a mmrm object - a discussion

Question

Updating a mmrm object - a discussion

Closed this issue 8 months ago · 3 comments

Let's say I have some model defined like this (don't mind the meaning, just an example)

mmrm(formula = FVCPostDiff ~ Timepoint + us(Timepoint | PatientId), 
    data = data, control = mmrm_control(method = df, vcov = vcov_type))

where df and vcov_type are provided via parameters (e.g. function arguments or selected from some scenario list).

Now, let's assume I want to calculate the Cook's distance over the residuals.
I need to set the model estimation to reml=FALSE
Or assume I want to add some term or remove some term from the model - same story.

The modification happens outside the original context (=environment) where the model was fit, usually in another function.

> update(m_mmrm, reml=FALSE)

Error in isTRUE(lhs) : object 'vcov_type' not found
Error in mmrm_control(method = df, vcov = vcov_type) :
Assertion on 'method' failed: Must be of type 'character', not 'closure'.

It couldn't find the df and vcov_type.
/ stats::df() is an existing functions, so it reported such strange kind of error. /

This can be fixed manually when fitting the model:

# get the original call referring to external variables:

(orig_call <- paste(deparse(m_mmrm$call), collapse = ""))
[1] "mmrm(formula = FVCPostDiff ~ Timepoint + us(Timepoint | PatientId),     data = data, control = mmrm_control(method = df, vcov = vcov_type))"

# replace these references with actual values. Remember to quote them.

( new_call  <- stringr::str_replace_all(orig_call, c("df" = shQuote(df), "vcov_type" = shQuote(vcov_type))) )
[1] "mmrm(formula = FVCPostDiff ~ Timepoint + us(Timepoint | PatientId),     data = data, control = mmrm_control(method = \"Satterthwaite\", vcov = \"Asymptotic\"))"

# replace the original call
  m_mmrm$call <- str2lang(new_call)

# check

> m_mmrm$call
mmrm(formula = FVCPostDiff ~ Timepoint + us(Timepoint | PatientId), 
    data = data, control = mmrm_control(method = "Satterthwaite", 
        vcov = "Asymptotic"))

Good, let's try it now:

> update(m_mmrm, reml=FALSE)
Error in rep(1, nrow(data)) : invalid 'times' argument

OK, also "data" is not reachable outside the original context. (the message comes from the fact that data() is an existing R function)

Let's try more explicitly, as luckily data is stored in the mmrm object:

> update(m_mmrm, reml=FALSE, data = m_mmrm$data)

mmrm fit

Formula:     FVCPostDiff ~ Timepoint + us(Timepoint | PatientId)
Data:        m_mmrm$data (used 91 observations from 44 subjects with maximum 3 timepoints)
Covariance:  unstructured (6 variance parameters)
Inference:   ML
Deviance:    64.11789

Coefficients: 
     (Intercept) TimepointWeek 16 TimepointWeek 56 
      0.05516954       0.09454553       0.10813262 

Model Inference Optimization:
Converged with code 0 and message: convergence: rel_reduction_of_f <= factr*epsmch

At last! So maybe try this way from the beginning?

OK, let's restart from the original case:

> update(m_mmrm, reml=FALSE, data = m_mmrm$data)
Error in mmrm_control(method = df, vcov = vcov_type) :
Assertion on 'method' failed: Must be of type 'character', not 'closure'.

Let's add "df" and "vcov_type"

> update(m_mmrm, reml=FALSE, data = m_mmrm$data, control = mmrm_control(method = m_mmrm$method, vcov = m_mmrm$vcov))
mmrm fit

Formula:     FVCPostDiff ~ Timepoint + us(Timepoint | PatientId)
Data:        m_mmrm$data (used 91 observations from 44 subjects with maximum 3 timepoints)
Covariance:  unstructured (6 variance parameters)
Inference:   ML
Deviance:    64.11789

Coefficients: 
     (Intercept) TimepointWeek 16 TimepointWeek 56 
      0.05516954       0.09454553       0.10813262 

Model Inference Optimization:
Converged with code 0 and message: convergence: rel_reduction_of_f <= factr*epsmch

OK, now it all works.

Would you consider adding update.mmrm() function which internally cares about these details, extracts the necessary slots and puts them into relevant places?

Answer 1 · 2023-11-30T09:15:24.000Z

Thanks @Generalized , definitely worth looking into this. Can you please add some dummy definitions / code to your example so that it is reproducible so I can run this on my machine?

Answer 2 · 2023-12-01T11:28:14.000Z

Sure. This kind of issue is difficult to figure out, if we work in the global scope, as here all necessary elements are in the current environment.

First, I'm gonna define two 2 datasets in a list.

/why?

In asthma studies it's common to analyze and report together 3 related spirometry parameters: FEV1, FVC and Tiffeneau index. The three parameters are (always) analyzed pre and post using a bronchodilator.

Now, if the sponsor wants the CFB (change from baseline) over time for all three parameters both pre/post to be obtained from model, then I need to fit 6 different models. To avoid typos, mislabelling the output, and self-repeating (DRY), I do it automatically, storing them in a list and iterating through it.
/

datasets <- list( "ds1" = data.frame(id=factor(1:10), response=1:10, time=factor(rep(LETTERS[1:2], each=5))),
                  "ds2" = data.frame(id=factor(1:10), response=1:10, time=factor(rep(LETTERS[1:2], each=5))))

> datasets
$ds1
   id response time
1   1        1    A
2   2        2    A
3   3        3    A
4   4        4    A
5   5        5    A
6   6        6    B
7   7        7    B
8   8        8    B
9   9        9    B
10 10       10    B

$ds2
   id response time
1   1        1    A
2   2        2    A
3   3        3    A
4   4        4    A
5   5        5    A
6   6        6    B
7   7        7    B
8   8        8    B
9   9        9    B
10 10       10    B

Then let's define a function that fits an mmrm.

/why?

In asthma studies I "validate" the mmrm output by GEE, and optionally by robust and/or permutation LMM (suffices for approx. CS and US) depending on case, as it's often formally requested from our Sponsor's biostat. teams and sometimes local regulators). In this area, the distribution of response, normalized & Pearson's residuals can rarely be approximated by the normal one well enough; some other issues happen too. Anyway, I have separate functions to fit MMRM, GEE and other's, to have a clean, simple (KISS) and consistent "API". Here I just show a toy example. /

Note, that the fitting function is parametrized, as for some of the endpoints "US" may not converge, so a different structure is selected (automatically or manually) based on the SAP.

fit_me <- function(ds, method = "Satterthwaite", vcov_type = "Empirical-Bias-Reduced") { 
   mmrm(response ~ time + us(time|id), data=ds, control = mmrm_control(method = method, vcov = vcov_type))
}

Now let's iterate and fit the models to the datasets. Let's also display the function calls to show the problem:

models <- lapply(datasets, fit_me)

> lapply(models, function(m) m$call)

$ds1
mmrm(formula = response ~ time + us(time | id), data = ds, control = mmrm_control(method = method, 
    vcov = vcov_type))

$ds2
mmrm(formula = response ~ time + us(time | id), data = ds, control = mmrm_control(method = method, 
    vcov = vcov_type))

OK, we can see that the model call has 3 references: to [ds], to [method] and to [vcov_type]. If they're not in the scope, evaluating the call (eval) will fail.

Below I define 3 functions "doing something" with the model, that requires re-fitting it with some change (e.g. reml=FALSE):

updates the model naively
updates the model providing the necessary controlling parameters
updates the model providing the controlling params and also the data entry

/ I don't want to complicate the code and put additional parameters to the function, IF they are already stored in the model./

do_something_with_model1 <- function(model) {
    update(model, reml=FALSE)
}

do_something_with_model2 <- function(model) {
    update(model, reml=FALSE, control = mmrm_control(method = model$method, vcov = model$vcov))
}
 
do_something_with_model3 <- function(model) {
    update(model, reml=FALSE, control = mmrm_control(method = model$method, vcov = model$vcov), data = model$data)
}

Let's run them:

> do_something_with_model1(models[[1]])
Error in isTRUE(lhs) : object 'method' not found

> do_something_with_model2(models[[1]])
Error in rep(1, nrow(data)) : invalid 'times' argument

> do_something_with_model3(models[[1]])
mmrm fit

Formula:     response ~ time + us(time | id)
Data:        model$data (used 10 observations from 10 subjects with maximum 2 timepoints)
Covariance:  unstructured (3 variance parameters)
Inference:   ML
Deviance:    35.31024

Coefficients: 
(Intercept)       timeB 
          3           5 
[...]

Answer 3 · 2023-12-11T10:42:54.000Z

Thanks again @Generalized for the discussion and the example! I have looked into this now in detail.

While it is tempting to try to come up with some kind of automated solution for your case, I don't think this is easy, sustainable or standard:

The problem is that we would need to infer which arguments are not available in the current scope, but should be taken instead from the model object itself.
This would modify the default behavior and lead to downstream problems. Hence I don't think this is sustainable.
Also, I think also for other packages or general model fitting calls in R you will face the same situation. Which is fortunately easy to resolve, with relatively readable error messages from R, and with the knowledge you have as developer about what should be taken from the model object. Consider e.g. glmmTMB with a similar example, that behaves in the same way:

library(glmmTMB)

glmm_datasets <- list(
  ds1 = Salamanders,
  ds2 = Salamanders
)
fit_glmm <- function(ds, fam = poisson) {
  glmmTMB(count ~ mined + (1|site), data=ds, family = fam)
}
glmm_models <- lapply(glmm_datasets, fit_glmm)
lapply(glmm_models, \(m) m$call)

do_something_with_glmm_model1 <- function(model) {
  update(model, se = FALSE)
}
do_something_with_glmm_model2 <- function(model) {
  update(model, se = FALSE, family = model$modelInfo$family)
}
do_something_with_glmm_model3 <- function(model) {
  update(model, se = FALSE, family = model$modelInfo$family, data = model$frame)
}

do_something_with_glmm_model1(glmm_models[[1]]) # fails
do_something_with_glmm_model2(glmm_models[[1]]) # fails
do_something_with_glmm_model3(glmm_models[[1]]) # works

Therefore I would propose to close this issue. Thanks again, and I think this discussion and workaround will be helpful in the future for others!