statistikat/VIM

including more general formulas in irmi

matthias-da opened this issue · 1 comments

currently, in irmi only simple formulas for each variable can be specified, such as

form=list(
  NonD  = c("BodyWgt", "BrainWgt"),
  Dream = c("BodyWgt", "BrainWgt"),
  Sleep = c("BrainWgt"),
  Span  = c("BodyWgt"),
  Gest  = c("BodyWgt", "BrainWgt")
)

However, it should also work for any formula specified (as long as the correct variable names are used).

library(VIM)
data(sleep)
form = list(
  "log(NonD)  ~ log(BodyWgt) + log(BrainWgt) + I(Sleep^2)",
  "Dream      ~ BodyWgt + BrainWgt:Danger",
  "log(Sleep) ~ BrainWgt * Danger + I(BrainWgt^2)",
  "Span       ~ ."
  "Gest       ~ sqrt(BodyWgt) + Span * Danger"
)
irmi(sleep, modelFormulas = form, trace = TRUE)

Only with this enhancement irmi can outperform other imputation methods and be efficiently used in practice for more complex data and statistical modelling purposes.

I would prefer a syntax without quotes for example

form = list(
  log(NonD)  ~ log(BodyWgt) + log(BrainWgt) + I(Sleep^2),
  Dream      ~ BodyWgt + BrainWgt:Danger,
  log(Sleep) ~ BrainWgt * Danger + I(BrainWgt^2),
  Span       ~ .,
  Gest       ~ sqrt(BodyWgt) + Span * Danger
)

As discussed with @matthias-da, transormations on the lhs (log(NonD) ~ .) are a bit tricky because determining back-transformations is not trivial. However, I will try to make the rhs more flexible. Possibly, the API will also use named lists where the names correspond to the column names

form = list(
  NonD  = ~ log(BodyWgt) + log(BrainWgt) + I(Sleep^2),
  Dream = ~ BodyWgt + BrainWgt:Danger,
  Sleep = ~ BrainWgt * Danger + I(BrainWgt^2),
  Span  = ~ .,
  Gest  = ~ sqrt(BodyWgt) + Span * Danger
)

This would make it fairly straightforward to maintain backward compability

normalize_model_formulas <- function(form) {
  lapply(
    form,
    function(x) {
      if (is.character(x))
        convert_to_formula(x)
      else
        x
    }
  )
}