boost-R/FDboost

Fix bootstrapCI

davidruegamer opened this issue · 14 comments

bootstrapCI throws errors for all m4 models (due to nested hmatrix indexing) and for ms1 as well as ms2 as in this case, some of the baselearners are not selected at all (which is still not handled). Everything else in the test file seems to work (except for those models, for which applyFolds or other validation function fail, e.g., binomial case).

@sbrockhaus regarding the hmatrix problem: I like your idea to add an option / argument to applyFolds or reweightData in order to have the possibility to overwrite the ids in the hmatrix object or expand the data appropriately if the object has already been subsetted by a bootstrap-like procedure. I would try to implement such a functionality and use the option TRUE in bootstrapCI for the inner resampling function .

Fixed the problem for ms1 and ms2 by handling intercepts for SOF models. Hopefully I did not disimprove everything...

Fixed hmatrix problem with nested use of applyFolds. Todo: Fix intercept problem for scalar response.

  • fix behaviour for factor-specific historical effects
  • historical effect with irregular response

Is it on purpose that rows and columns are the other way round for the effect of a factor base-learner?

library(FDboost)

#########
# model with linear functional effect, use bsignal()
# Y(t) = f(t) + \int X1(s)\beta(s,t)ds + eps
set.seed(2121)
data1 <- pffrSim(scenario = "ff", n = 40)
data1$X1 <- scale(data1$X1, scale = FALSE)
dat_list <- as.list(data1)
dat_list$t <- attr(data1, "yindex")
dat_list$s <- attr(data1, "xindex")

dat_list$z <- gl(n=2, k=1, length=nrow(dat_list$Y))

## model fit by FDboost 
m1 <- FDboost(Y ~ 1 + bolsc(z, df=1), 
              timeformula = ~ bbs(t, knots = 5), data = dat_list, 
              control = boost_control(mstop = 21))


## Not run:              
# a short example with not so meaningful number of folds
bootCIs <- bootstrapCI(m1, B_inner = 3, B_outer = 5)  

str(bootCIs$raw_results[2:3])

The str gives the output:

List of 2
 $ "bols(ONEx, intercept = FALSE, df = 1) %A0% bbs(t, knots = 5, df = 4)": int [1:5, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
 $ "bolsc(z, df = 1) %O% bbs(t, knots = 5)"                              :List of 2
  ..$ : num [1:40, 1:5] -0.0293 -0.0262 -0.0228 -0.0193 -0.0159 ...
  ..$ : num [1:40, 1:5] 0.0265 0.0237 0.0207 0.0175 0.0143 ...

The first effect is as expected a B_outer x 40 matrix;
the second effect is a list of 40 x B_outer matrices

It's due the fact, that all coefficients but the offset are extracted with lapply while for the offset sapply is used (since in this case, the behaviour of sapply is clear -- in contrast to other cases, in which the coefficients are lists (of lists) and the behaviour of sapply is imo not that clear). But if you would like to change it, I have nothing against it.

I don't think that the answer is that easy, consider the following object which belongs to the model with formula = Y ~ 1 + bbsc(xsmoo, df = 3) + bolsc(z, df = 3) -- see all_effectsTEST.R

> str(test_bbsc$raw_results)
List of 4
 $ offsets                                                                   
       : num [1:3, 1:40] 0.979 1.099 1.096 2.279 2.173 ...
  ..
 $ "bols(ONEx, intercept = FALSE, df = 1) %A0% bbs(tvals, knots = 10, df = 9)": 
      int [1:3, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
  ..
 $ "bbsc(xsmoo, df = 3) %O% bbs(tvals, knots = 10, df = 3)"            :       
      num [1:3, 1:1600] 1.35 1.27 1.19 1.23 1.16 ...
  ..
 $ "bolsc(z, df = 3) %O% bbs(tvals, knots = 10, df = 3)"                      :
      List of 2
  ..$ : num [1:40, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ : num [1:40, 1:3] 0 0 0 0 0 0 0 0 0 0 ...

Is fixed now. Also had to use droplevels for your example, because z has three levels but only two different manefestations.

Thanks!

Models, which still do not pass the test:

  • m3i -> error in reweightData: Length of weights and number of observations do not match!
  • m4 -> missing value in names(listOfCoefs)[i] != "offsets"
  • m4i, m4ii, m4iii -> see m3i

Made first improvements to handle factor-specific historical effects (in particular correcting the error with m4). A proper handling in the plot.bootstrapCI function and bug fixes for the other models (which cause reweightData to fail) is still missing.

The error for m3i occurs in applyFolds() due to the argument redefineWeights = TRUE.
Test added.

It seems like the problem is not (just) the factor-specific historical effect but actually an incorrect subsetting behaviour for hmatrix object. I will try to outsource this part and write a separate hmatrix subsetter in order to get things better structured.