Fix bootstrapCI
davidruegamer opened this issue · 14 comments
bootstrapCI
throws errors for all m4
models (due to nested hmatrix
indexing) and for ms1
as well as ms2
as in this case, some of the baselearners are not selected at all (which is still not handled). Everything else in the test file seems to work (except for those models, for which applyFolds
or other validation function fail, e.g., binomial case).
@sbrockhaus regarding the hmatrix
problem: I like your idea to add an option / argument to applyFolds
or reweightData
in order to have the possibility to overwrite the ids in the hmatrix
object or expand the data appropriately if the object has already been subsetted by a bootstrap-like procedure. I would try to implement such a functionality and use the option TRUE
in bootstrapCI
for the inner resampling function .
Fixed the problem for ms1
and ms2
by handling intercepts for SOF models. Hopefully I did not disimprove everything...
Fixed hmatrix
problem with nested use of applyFolds
. Todo: Fix intercept problem for scalar response.
- fix behaviour for factor-specific historical effects
- historical effect with irregular response
Is it on purpose that rows and columns are the other way round for the effect of a factor base-learner?
library(FDboost)
#########
# model with linear functional effect, use bsignal()
# Y(t) = f(t) + \int X1(s)\beta(s,t)ds + eps
set.seed(2121)
data1 <- pffrSim(scenario = "ff", n = 40)
data1$X1 <- scale(data1$X1, scale = FALSE)
dat_list <- as.list(data1)
dat_list$t <- attr(data1, "yindex")
dat_list$s <- attr(data1, "xindex")
dat_list$z <- gl(n=2, k=1, length=nrow(dat_list$Y))
## model fit by FDboost
m1 <- FDboost(Y ~ 1 + bolsc(z, df=1),
timeformula = ~ bbs(t, knots = 5), data = dat_list,
control = boost_control(mstop = 21))
## Not run:
# a short example with not so meaningful number of folds
bootCIs <- bootstrapCI(m1, B_inner = 3, B_outer = 5)
str(bootCIs$raw_results[2:3])
The str gives the output:
List of 2
$ "bols(ONEx, intercept = FALSE, df = 1) %A0% bbs(t, knots = 5, df = 4)": int [1:5, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
$ "bolsc(z, df = 1) %O% bbs(t, knots = 5)" :List of 2
..$ : num [1:40, 1:5] -0.0293 -0.0262 -0.0228 -0.0193 -0.0159 ...
..$ : num [1:40, 1:5] 0.0265 0.0237 0.0207 0.0175 0.0143 ...
The first effect is as expected a B_outer x 40 matrix;
the second effect is a list of 40 x B_outer matrices
It's due the fact, that all coefficients but the offset are extracted with lapply
while for the offset sapply
is used (since in this case, the behaviour of sapply
is clear -- in contrast to other cases, in which the coefficients are lists (of lists) and the behaviour of sapply
is imo not that clear). But if you would like to change it, I have nothing against it.
I don't think that the answer is that easy, consider the following object which belongs to the model with formula = Y ~ 1 + bbsc(xsmoo, df = 3) + bolsc(z, df = 3)
-- see all_effectsTEST.R
> str(test_bbsc$raw_results)
List of 4
$ offsets
: num [1:3, 1:40] 0.979 1.099 1.096 2.279 2.173 ...
..
$ "bols(ONEx, intercept = FALSE, df = 1) %A0% bbs(tvals, knots = 10, df = 9)":
int [1:3, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
..
$ "bbsc(xsmoo, df = 3) %O% bbs(tvals, knots = 10, df = 3)" :
num [1:3, 1:1600] 1.35 1.27 1.19 1.23 1.16 ...
..
$ "bolsc(z, df = 3) %O% bbs(tvals, knots = 10, df = 3)" :
List of 2
..$ : num [1:40, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
..$ : num [1:40, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
Is fixed now. Also had to use droplevels
for your example, because z
has three levels but only two different manefestations.
Thanks!
Models, which still do not pass the test:
m3i
-> error inreweightData
:Length of weights and number of observations do not match!
m4
->missing value in names(listOfCoefs)[i] != "offsets"
m4i
,m4ii
,m4iii
-> seem3i
Made first improvements to handle factor-specific historical effects (in particular correcting the error with m4
). A proper handling in the plot.bootstrapCI
function and bug fixes for the other models (which cause reweightData
to fail) is still missing.
The error for m3i
occurs in applyFolds()
due to the argument redefineWeights = TRUE
.
Test added.
It seems like the problem is not (just) the factor-specific historical effect but actually an incorrect subsetting behaviour for hmatrix
object. I will try to outsource this part and write a separate hmatrix
subsetter in order to get things better structured.