femiguez/nlraa

simulate_gnls with psim=2 and data that are a new length does not assign residuals correctly

billdenney opened this issue · 10 comments

There is a warning that longer object length is not a multiple of shorter object length" indicating that there is an issue with assigning residuals in simulate_gnls(..., psim=2)`.

I believe that the error is related to this line of code:

rsds <- rsds.std * attr(residuals(object), "std") ## This last term is 'sigma'

library(nlraa)
library(nlme)

fm1 <-
  gnls(
    weight ~ SSlogis(Time, Asym, xmid, scal),
    Soybean,
    weights = varPower()
  )

new_data <- data.frame(Time=14:84)

try0 <- simulate_gnls(fm1, data=new_data, psim=0)
try1 <- simulate_gnls(fm1, data=new_data, psim=1)
# the warnings here indicate that there is a problem assigning residuals to the
# correct length vector
try2 <- simulate_gnls(fm1, data=new_data, psim=2)
#> Warning in rsds.std * attr(residuals(object), "std"): longer object length is
#> not a multiple of shorter object length
#> Warning in val + rsds: longer object length is not a multiple of shorter object
#> length

Created on 2021-08-10 by the reprex package (v2.0.0)

Thanks for the report Bill! I am aware of this issue and I should maybe include a better error message. The argument 'data' should really only be used when the data are out of scope as in the examples you were running before. It should be identical to the data used to fit the model and I do not have a good way of checking that the data passed in this way is indeed identical to the one used for fitting the model. (Maybe I need to think more about this). I could check that it has the same dimensions and throw an error if it does not match. There is a 'hidden' argument 'newdata' which works but it is only compatible with psim 0 or 1 in that case and not 2 - I did not intend to make this available to the user. The issue is that I cannot create a variance-covariance matrix for the new data, because I do not know how to extend its error structure (I'm not sure this is possible).

I do think that if psim=2 is only supported on the original data, that the above should be an error.

I think that the function nlme::varWeights() may be able to help (or maybe it could provide an instructive example of how to help). For simple additive residual error structures, I would think that rnorm(n, mean=0, sd=[somehow extract the residual standard deviation]) should usually work. For other variance structures, I hope that maybe nlme::varWeights() can provide the idea for how to do it.

Yes, I just changed the code so that better error messages are reported in some of these cases. Thanks for the report!

This is why the 'newdata' argument is hidden. :) For existing data, the model does give me a std for each observation, but when newdata are used it is tricky, because in realty I should be propagating the uncertainty considering that this is not part of the data used to fit the model. I could ignore this for now and interpolate the stds for these new observations, but it could be unreliable if we are extrapolating. I have not had the need to implement this and also I have no idea how to extend it to a model with correlated errors.

If you really need this feature, I can try to implement it for the case of unequal variance, but not for correlated errors.

I don't really need it right now (my main need was psim=1 which is working for me), but I do think that it would be very helpful for some of my projects.

I thought about this a bit more and while I will try to implement it, it won't be trivial.

Also, notice that the stats::simulate.lm function does not have a newdata argument. For this purpose the stats::predict.lm function should be used. In my package, however, I need the nlraa::simulate_* functions in order to implement the predict_* and boot_* functions, so I think it is a better design that if you really want predictions on newdata, you should be using the nlraa::predict_nlme function. I do realize that this might not be what you need since it reports intervals and not observation-level data.

Right now in the package I have this:

library(nlme)
library(nlraa)
fm1 <- gnls(weight ~ SSlogis(Time, Asym, xmid, scal), Soybean,
             weights = varPower())
dat <- data.frame(Time = 15:90)
## This does not work
prd <- predict_nlme(fm1, interval = "pred", newdata = dat)
## Error: At this point newdata is not compatible with observation-level simulation
## This works fine
prd2 <- predict_nlme(fm1, interval = "conf", newdata = dat)

In any case, I will think more about the best way to implement it.

I was in the process of using simulate_gnls() to make confidence intervals; that was my underlying goal! Thanks for simplifying that and pointing out that the predict_nlme() function can also work for gnls models.

If you are interested in confidence intervals, take a look at the vignette 'Confidence-bands' which is here in github, but not in the CRAN version of the package. Also here:
https://femiguez.github.io/nlraa-docs/Confidence-Bands.html

I have now implemented the case of having newdata and psim =2 (observation-level simulation and thus prediction intervals) for some variance structures. I wrote specific code for 'varIdent', 'varExp', 'varPower' and 'varFixed'. I also wrote some tests and it seems to be working, but will keep testing. I'm still thinking about how to deal with correlated errors and do not have a good approach yet. It seems to make the predict_ functions much slower though, so there is room for improvement.

Thanks! I'll give it a try. (I will try the next time I need something like it, but it may be a while.)

For the most part this is implemented. Dealing with correlated errors is a potential enhancement down the road - don't have a good approach yet.