title | author | date | output | ||||
---|---|---|---|---|---|---|---|
README |
Zachary McCaw |
2018-09-11 |
|
Suppose that a multivariate normal outcome vector is observed for each subject. This package considers the general case where separate regression models are specified for each element of the outcome vector, and the outcome covariance matrix is left unstructured. Utilities are provided for model estimation and for inference on the regression parameters.
Suppose an outcome vector
Conditional on covariates, the outcome follows a multivariate normal distribution with unstructured covariance:
Suppose
Below, data is simulated for
library(MNR);
set.seed(100);
# Observations
n = 1e3;
## Design matrices
X1 = cbind(1,matrix(rnorm(2*n),nrow=n));
colnames(X1) = c("int",paste0("x0",seq(1:2)));
X2 = cbind(1,matrix(rnorm(3*n),nrow=n));
colnames(X2) = c("int",paste0("x1",seq(1:3)));
X3 = cbind(1,matrix(rnorm(4*n),nrow=n));
colnames(X3) = c("int",paste0("x2",seq(1:4)));
X = list(X1,X2,X3);
# Target Parameter
b1 = c(-1,0.1,-0.1);
b2 = c(1,-0.1,0.1,0);
b3 = c(0,0.1,-0.1,0.1,-0.1);
b = list(b1,b2,b3);
# Exchangeable covariance structure
S = array(0.5,dim=c(3,3)) + 0.5*diag(3);
# Generate data
Y = rMNR(X=X,b=b,S=S);
The outcome Y
is expected as a numeric matrix, with observations as rows. Covariates are supplied as a list of numeric matrices, one for each column of Y
.
Below, parameters are estimated for the trivariate normal data generated above. The fitted model is of class mnr
. The show method provides a table of estimated regression coefficients, along with standard errors, and Wald-type confidence intervals and p-values. The fitted model M
contains the following components:
- Regression coefficients, extracted using
coef(M)
orM@Coefficients
. - Outcome covariance matrix, extracted using
vcov(M,type="Outcome")
orM@Covariance
. - Regression coefficient information matrix, extracted using
vcov(M,type="Information")
orM@Information
. - Outcome residual matrix, extracted using
resid(M)
orM@Residuals
.
# Model fitting
M = fit.mnr(Y=Y,X=X,eps=1e-8);
cat("Show method for fitted model:\n");
show(M);
cat("\n");
cat("Comparison of estimated coefficients with the truth:\n");
Coeff = coef(M);
show(data.frame(Coeff[,c("Outcome","Coeff")],"Est"=round(Coeff$Point,digits=3),"Truth"=unlist(b)));
## Objective increment: 6.41
## Objective increment: 0.00253
## Objective increment: 1.16e-06
## Objective increment: 3.79e-10
## 3 update(s) performed before tolerance limit.
##
## Show method for fitted model:
## Outcome Coeff Point SE L U p
## 1 y1 int -1.00000 0.0312 -1.0600 -0.9420 3.55e-227
## 2 y1 x01 0.14500 0.0248 0.0962 0.1930 5.33e-09
## 3 y1 x02 -0.09390 0.0261 -0.1450 -0.0427 3.21e-04
## 4 y2 int 0.99900 0.0308 0.9390 1.0600 1.08e-230
## 5 y2 x11 -0.02940 0.0244 -0.0771 0.0184 2.28e-01
## 6 y2 x12 0.12000 0.0266 0.0680 0.1720 6.27e-06
## 7 y2 x13 0.01060 0.0263 -0.0408 0.0621 6.85e-01
## 8 y3 int 0.00247 0.0303 -0.0569 0.0618 9.35e-01
## 9 y3 x21 0.11400 0.0262 0.0622 0.1650 1.48e-05
## 10 y3 x22 -0.11000 0.0259 -0.1610 -0.0593 2.11e-05
## 11 y3 x23 0.07000 0.0261 0.0188 0.1210 7.37e-03
## 12 y3 x24 -0.07820 0.0258 -0.1290 -0.0276 2.44e-03
##
## Comparison of estimated coefficients with the truth:
## Outcome Coeff Est Truth
## 1 y1 int -1.003 -1.0
## 2 y1 x01 0.145 0.1
## 3 y1 x02 -0.094 -0.1
## 4 y2 int 0.999 1.0
## 5 y2 x11 -0.029 -0.1
## 6 y2 x12 0.120 0.1
## 7 y2 x13 0.011 0.0
## 8 y3 int 0.002 0.0
## 9 y3 x21 0.114 0.1
## 10 y3 x22 -0.110 -0.1
## 11 y3 x23 0.070 0.1
## 12 y3 x24 -0.078 -0.1
Below, hypotheses are tested on the model fit to the trivariate normal data generated above. The column of Y
which is of interest is specified using j
. The hypothesis test is specified using a logical vector L
. Elements of L
that are constrained under the null are set to TRUE
, those which are estimated under the null are set to FALSE
. At least one element of L
must be TRUE
(a test must be specified), and at least one element of L
must be FALSE
(the null model must be estimable).
- The first assesses
$H_{0}:\beta_{13}=0$ , which is false. - The second test assesses
$H_{0}:\beta_{24}=0$ , which is true. - The third test assesses
$H_{0}:\beta_{32} = \cdots = \beta_{35} = 0$ , which is false. - The fouth test assesses
$H_{0}:\beta_{32} = \beta_{34} = 0.1$ , which is true.
cat("Test b13 = 0, which is FALSE:\n");
Score.mnr(Y=Y,j=1,X=X,L=c(F,F,T));
cat("\n");
cat("Test b24 = 0, which is TRUE:\n");
Score.mnr(Y=Y,j=2,X=X,L=c(F,F,F,T));
cat("\n");
cat("Test b32 = ... = b35 = 0, which is FALSE:\n");
Score.mnr(Y=Y,j=3,X=X,L=c(F,T,T,T,T));
cat("\n");
cat("Test b32 = b34 = 0.1, which is TRUE:\n");
Score.mnr(Y=Y,j=3,X=X,b10=c(0.1,0.1),L=c(F,T,F,T,F));
## Test b13 = 0, which is FALSE:
## Score df p
## 12.572560188 1.000000000 0.000391452
##
## Test b24 = 0, which is TRUE:
## Score df p
## 0.1637477 1.0000000 0.6857293
##
## Test b32 = ... = b35 = 0, which is FALSE:
## Score df p
## 5.088565e+01 4.000000e+00 2.358459e-10
##
## Test b32 = b34 = 0.1, which is TRUE:
## Score df p
## 1.5734197 2.0000000 0.4553405
Consider testing for association between a specified column of Y
, and each column of a matrix G
. The function rScore.mnr
accelerates association testing by recycling the same null model for each hypothesis test. This function is motivated by genetic association testing, where G
represents genotype at different loci across the genome. The genotype matrix G
may contain missing values, although the outcome Y
and design matrices X
still should not.
# Genotype matrix
G = replicate(2000,rbinom(n=1000,size=2,prob=0.25));
storage.mode(G) = "numeric";
# Introduce missingness
G[sort(sample(length(G),size=0.01*length(G)))] = NA;
# Repeated Score test
doMC::registerDoMC(cores=4);
R = rScore.mnr(Y=Y,G=G,X=X,report=T,parallel=T);
# Estimated size
mean(R<=0.05);