pbs-assess/sdmTMB

get_index() over alternate time variable?

seanhardison1 opened this issue · 3 comments

I've fitted an IID spatiotemporal model using time = "year" in addition to a factor-smooth interaction for year month: s(month, year_fac, bs = 'fs'). This is convenient for me because I can predict over months within years without needing to fit the ST random field by month.

Ultimately, I would like to generate a biomass index by month that includes SEs, and I've found that I can do this by simulating predictions from the joint posterior distribution (predict(..., nsim = ...)) and then summarizing them using get_index_sims(...). To do this, I updated get_index_sims so that it alters the time attribute passed from the fitted model from year to a yearmonth numeric.

area_function = function(x, area) x + log(area)
agg_function = function(x) sum(exp(x))
est_function = stats::median

make_index <- 
  function(pred_sims = pred_out_seas_sims, 
           pred_grid = pred_df_glmm_filt,
           area = 6.25){
    
    row.names(pred_sims) <- pred_grid %>% 
      pull(ymon)
    
    attr(pred_sims, "time") <- "ymon"
    .time_attr <- attr(pred_sims, "time")
    pred_sims <- apply(pred_sims, 2L, function(x) area_function(x, area))
    .t <- as.numeric(rownames(pred_sims))
    ymons <- sort(unique(.t))
    ymon_indexes <- lapply(ymons, function(x) which(.t %in% x))
    out1 <- lapply(ymon_indexes, function(x) {
      apply(pred_sims[x, , drop = FALSE], 2L, agg_function)
    })
    out <- lapply(out1, function(x) {
      data.frame(est = est_function(x),
                 lwr = stats::quantile(x,
                                       probs = (1 - 0.95)/2),
                 upr = stats::quantile(x,
                                       probs = 1 - (1 - 0.95)/2), 
                 se = stats::sd(log(x)),
                 log_est = mean(log(x)))
    })
    out <- do.call("rbind", out)
    out[[.time_attr]] <- ymons
    out <- out[, c(.time_attr, "est", "lwr", "upr", "log_est", 
                   "se"), drop = FALSE]
    return(out)
  }

This seems to work for me. However, it appears to be much more challenging to do something similar with get_index because of outputs from predict when return_tmb = T (not a simple attribute swap).

So my question is, is it feasible to generate an index using get_index that does not use the time attribute of the ST random field? This would be desirable when it is computationally infeasible to estimate the ST random field at sub-annual temporal resolutions, and when the analyst (me!) would like SEs returned with the index without knowledge of TMB.

Can you calculate predictions and do the Monte Carlo integration (summing the grid cell values weighted by area) for each month at a time? get_index() is just summing over whatever you give it grouped by the time variable.

index <- lapply(month_vector, \(m) {
  # make your newdata grid here for a given month
  nd <- subset(grid, month == m) # or similar
  p <- predict(fit, newdata = nd, return_tmb_object = TRUE)
  get_index(p, bias_correct = TRUE)
})
index <- do.call(rbind, index)

If memory happens to be an issue, you could also change month_vector to a year-month vector.

A full example:

library(sdmTMB)
d <- dogfish
d$month <- rep_len(1:12, nrow(d))

d$year_fac <- factor(d$year)

fit <- sdmTMB(
  catch_weight ~ s(month, year_fac, bs = 'fs'), 
  data = d, 
  time = "year", 
  spatial = "off", 
  spatiotemporal = "off", 
  family = tweedie()
)

grid <- replicate_df(sdmTMB::wcvi_grid, time_name = "year", time_values = unique(dogfish$year))
grid$year_fac <-  factor(grid$year)

index <- lapply(1:12, \(m) {
  nd <- dplyr::mutate(grid, month = m)
  p <- predict(fit, newdata = nd, return_tmb_object = TRUE)
  ii <- get_index(p, bias_correct = TRUE)
  ii$month <- m
  ii
})
index <- do.call(rbind, index)

head(index)
#>   year      est       lwr       upr  log_est        se month
#> 1 2004 735024.0 284545.22 1898679.6 13.50766 0.4841981     1
#> 2 2006 146673.0  60951.51  352952.0 11.89596 0.4480321     1
#> 3 2008 668907.7 310499.33 1441025.7 13.41340 0.3915706     1
#> 4 2010 610885.8 272372.10 1370116.4 13.32267 0.4121203     1
#> 5 2012 182703.0  67879.94  491756.5 12.11562 0.5051733     1
#> 6 2014 109097.7  43682.18  272475.2 11.60000 0.4670002     1
tail(index)
#>     year       est      lwr      upr  log_est        se month
#> 115 2012  58635.02 22658.67 151732.9 10.97909 0.4851056    12
#> 116 2014  87922.85 29141.26 265274.3 11.38421 0.5634311    12
#> 117 2016  83998.63 27905.75 252842.9 11.33856 0.5622388    12
#> 118 2018 110949.10 44259.81 278123.7 11.61683 0.4688833    12
#> 119 2021  90260.13 35407.12 230091.9 11.41045 0.4774490    12
#> 120 2022 231489.82 97390.52 550233.6 12.35229 0.4417464    12

Created on 2024-04-24 with reprex v2.1.0

Ah you beat me to it! Yep, that works for me. Thank you for your help as always. Case closed I think