florianhartig/DHARMa

Adding support for multinomial family in mgcv::gam

heeringa0 opened this issue · 4 comments

Hi,

Thanks for developing and making available the DHARMa package.
I am trying to use this for diagnosing a multinomial gam model:

library(mgcv)
library(DHARMa)

all <- list()

all$formula <- formula(Type ~ s(Frisian_proficiency) +
s(IQ) +
te(IQ , by = Measurement) +
s(Subject, bs="re") +
s(Measurement, bs="re"), data = df)

all$data <- df

f <- replicate(2, update(all$formula, NULL~.))
f[[1]] <- update(f[[1]], Type~.)

model.gam <- gam(f, data=df, family=multinom(2), method="fREML")
summary(model.gam) # works!

simRes <- simulateResiduals(fittedModel = model.gam, plot = F)

Here I get an error:
Error in match.arg(type) : 'arg' should be “deviance”

What solution woud you recommend?

Hi,

thanks for reporting this - I can't run it because I don't have the df object, but I can reproduce your problem with an example from the help, see below.

It is not surprising that you get an error because I haven't implemented any support in DHARMa for multinomial logic or clogit GLMM, as I wasn't really sure how to define quantile residuals for these distributions. The reason why you didn't get a better error straight away is that I overlooked that mgcv also provides a multinomial.

On the positive side of things:

  • I just checked, although mgcv does not provide a simulate function, mgcViz does. So, I can produce simulations from mgcv multinomial models (which is the first prerequisite for supporting this in DHARMa)
  • There is a recent paper that discusses how to define quantile residuals for the multinomial https://onlinelibrary.wiley.com/doi/10.1002/sam.11645?af=R

I will take a look at this paper and see if that could be implemented in DHARMa. If so, I will report it here. Of course, feel free to add further thoughts on this if you have any.

Best
Florian

library(mgcv)
set.seed(6)
## simulate some data from a three class model
n <- 1000
f1 <- function(x) sin(3*pi*x)*exp(-x)
f2 <- function(x) x^3
f3 <- function(x) .5*exp(-x^2)-.2
f4 <- function(x) 1
x1 <- runif(n);x2 <- runif(n)
eta1 <- 2*(f1(x1) + f2(x2))-.5
eta2 <- 2*(f3(x1) + f4(x2))-1
p <- exp(cbind(0,eta1,eta2))
p <- p/rowSums(p) ## prob. of each category 
cp <- t(apply(p,1,cumsum)) ## cumulative prob.
## simulate multinomial response with these probabilities
## see also ?rmultinom
y <- apply(cp,1,function(x) min(which(x>runif(1))))-1
## plot simulated data...
plot(x1,x2,col=y+3)

## now fit the model...
b <- gam(list(y~s(x1)+s(x2),~s(x1)+s(x2)),family=multinom(K=2))
plot(b,pages=1)
gam.check(b)
summary(b)


simRes <- simulateResiduals(fittedModel = b)
simulate(b) # simulations not implemented 
library(mgcViz)
simulate(b) # mgcViz would allow simulations 

Hi Florian,

Maybe this way?

sim_gam <- mgcViz::simulate.gam(b, nsim=100)

sim_res_gam_1 <- createDHARMa(simulatedResponse       = sim_gam,
                              observedResponse        = b$model$y,
                              fittedPredictedResponse = predict(b)[,1],
                              integerResponse         = T)

DHARMa::plotQQunif(sim_res_gam_1)

sim_res_gam_2 <- createDHARMa(simulatedResponse       = sim_gam,
                             observedResponse        = b$model$y,
                             fittedPredictedResponse = predict(b)[,2],
                             integerResponse         = T)

DHARMa::plotQQunif(sim_res_gam_2)

Best,
Wilbert

Hi Wilbert,

yes, you can formally do that, and I don't think it will hurt, in the sense that if the model is correctly specified, you should see nothing, but the question is if the standard plots make sense to diagnose problems in the multinomial.

I don't think the standard plots are very informative. For example, if standard plots show a dispersion problem, it would likely not be dispersion, but rather a sign that some classes are over predicted.

I think in the end I will have to do special tests / plots for multi-class responses. What I could imagine is to transform the res ~ observed plot such that you see a res ~ predicted as done in the last example below for all classes in one plot.

What one could also check already would be using the testGeneric if the number of predicted 1,2,3 corresponds to the observed, see also example below.

I think it will take some trying around to find out which diagnostics are informative for this type of models!

# testing if missing predictor would show up

b <- gam(list(y~s(x1),~s(x1)),family=multinom(K=2))
plot(b,pages=1)
gam.check(b)
summary(b)


sim_gam <- mgcViz::simulate.gam(b, nsim=100)

sim_res_gam_1 <- createDHARMa(simulatedResponse       = sim_gam,
                              observedResponse        = b$model$y,
                              fittedPredictedResponse = predict(b)[,1],
                              integerResponse         = T)

plot(sim_res_gam_1)
plotResiduals(sim_res_gam_1, form = x2)


# creating plots just for class 2

makeBinary = function(x) as.numeric(x == 2)
sim <- apply(sim_gam,1:2,makeBinary) 

sim_res_gam_1 <- createDHARMa(simulatedResponse       = sim,
                              observedResponse        = makeBinary(b$model$y),
                              fittedPredictedResponse = predict(b)[,2],
                              integerResponse         = T)

plot(sim_res_gam_1)
plotResiduals(sim_res_gam_1, form = x2)

# checking for correct number of predictions in class 2
# as we transformed to 0/1 we can just repurpose the 
# testZeroInflation function 

testZeroInflation(sim_res_gam_1)

Thanks Florian,

A res ~ predicted for all classes in one plot sounds like a good idea.

Best,
Wilbert