Problems with independent factor variables
Closed this issue · 2 comments
Issue description
When including factor variables as independent (predictor) variables in svy2lme()
, I encountered the following error message:
Error: non-conformable matrix dimensions in Matrix(e1) * e2.
I tracked it down to Line #265 in R/svylmeNG.R
, which computes
xwr<-X*(W%*%r)
To my understanding (inspecting the cases where it worked), this line is used for element-wise multiplying each column of X
with the result of W%*%r
.
Although there are cases where the element-wise multiplication *
operator works in the intended way (i.e. column-by-column on X
), this code is not working reliably for other cases (see minimal reproducible example below).
Solution
If I correctly understood your code, one can simply replace Line #265 in R/svylmeNG.R
(and presumably the corresponding Line #245 in R/svy2relmat.R
) with
Wr <- (W%*%r)
xwr <- Reduce(cbind, apply(X, 2, function(X_j) X_j * Wr))
to make the element-wise multiplication work reliably on each column of X
.
Minimal Reproducible Example
The following code illustrates a case where svy2lme()
fails due to the issue
(but works if you set n_categories <- 3
).
# Load packages
library(survey)
library(svylme)
# Make RNG reproducible
set.seed(1)
# Number of observations in example data
N <- 800
# Define number of categories to generate for predictor x
# Everything works for 1 < n_categories < 4
# For n_categories >= 4:
# Error: non-conformable matrix dimensions in Matrix(e1) * e2
# Note: Everything works if x is not a factor variable!
n_categories <- 4
# Generate random example data
mwe_data <-
data.frame(
id = 1:N,
group = factor(sample.int(50, N, replace = TRUE)),
x = factor(sample.int(n_categories, N, replace = TRUE)),
y = rnorm(N)
)
# Define survey design
design <-
svydesign(
ids = ~id,
data = mwe_data
)
# Estimate model
model <-
svy2lme(
y ~ (1 | group) + x,
design
)
sessionInfo()
output
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] svylme_1.5-1 survey_4.4-2 survival_3.6-4 Matrix_1.7-0
loaded via a namespace (and not attached):
[1] lme4_1.1-35.5 mitools_2.4 lattice_0.22-6 splines_4.4.1 cli_3.6.3 nloptr_2.1.1 DBI_1.2.3 data.table_1.15.4
[9] compiler_4.4.1 boot_1.3-30 tools_4.4.1 nlme_3.1-164 minqa_1.2.7 Rcpp_1.0.12 rlang_1.1.4 jsonlite_1.8.8
[17] MASS_7.3-60.2
Thanks for this. A better fix seems to be to use X*drop(W%*%r)
or even X*as.vector(W%*%r)
I have no idea why this was working for 3 categories and not for 4, though.
Thanks for making the changes so quickly!
That's interesting - I thought the issue was the structure of X,
but if drop(W%*%r)
works, then the problem must lie in W
or r
.
I'm also puzzled about why it worked with 3 categories but not with 4.