tslumley/svylme

Problems with independent factor variables

Closed this issue · 2 comments

Issue description

When including factor variables as independent (predictor) variables in svy2lme(), I encountered the following error message:

Error: non-conformable matrix dimensions in Matrix(e1) * e2.

I tracked it down to Line #265 in R/svylmeNG.R, which computes

    xwr<-X*(W%*%r)

To my understanding (inspecting the cases where it worked), this line is used for element-wise multiplying each column of X with the result of W%*%r.

Although there are cases where the element-wise multiplication * operator works in the intended way (i.e. column-by-column on X), this code is not working reliably for other cases (see minimal reproducible example below).

Solution

If I correctly understood your code, one can simply replace Line #265 in R/svylmeNG.R (and presumably the corresponding Line #245 in R/svy2relmat.R) with

    Wr <- (W%*%r)
    xwr <- Reduce(cbind, apply(X, 2, function(X_j) X_j * Wr))

to make the element-wise multiplication work reliably on each column of X.

Minimal Reproducible Example

The following code illustrates a case where svy2lme() fails due to the issue
(but works if you set n_categories <- 3).

    # Load packages
    library(survey)
    library(svylme)

    # Make RNG reproducible
    set.seed(1)

    # Number of observations in example data
    N <- 800

    # Define number of categories to generate for predictor x
    #   Everything works for 1 < n_categories < 4
    #   For n_categories >= 4:
    #       Error: non-conformable matrix dimensions in Matrix(e1) * e2
    #   Note: Everything works if x is not a factor variable!
    n_categories <- 4

    # Generate random example data
    mwe_data <-
        data.frame(
            id = 1:N,
            group = factor(sample.int(50, N, replace = TRUE)),
            x = factor(sample.int(n_categories, N, replace = TRUE)),
            y = rnorm(N)
        )

    # Define survey design
    design <-
        svydesign(
            ids = ~id,
            data = mwe_data
        )

    # Estimate model
    model <-
        svy2lme(
            y ~ (1 | group) + x,
            design
        )

sessionInfo() output

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] svylme_1.5-1   survey_4.4-2   survival_3.6-4 Matrix_1.7-0  

loaded via a namespace (and not attached):
 [1] lme4_1.1-35.5     mitools_2.4       lattice_0.22-6    splines_4.4.1     cli_3.6.3         nloptr_2.1.1      DBI_1.2.3         data.table_1.15.4
 [9] compiler_4.4.1    boot_1.3-30       tools_4.4.1       nlme_3.1-164      minqa_1.2.7       Rcpp_1.0.12       rlang_1.1.4       jsonlite_1.8.8   
[17] MASS_7.3-60.2

Thanks for this. A better fix seems to be to use X*drop(W%*%r) or even X*as.vector(W%*%r)

I have no idea why this was working for 3 categories and not for 4, though.

Thanks for making the changes so quickly!
That's interesting - I thought the issue was the structure of X, but if drop(W%*%r) works, then the problem must lie in W or r.
I'm also puzzled about why it worked with 3 categories but not with 4.