0todd0000/spm1dmatlab

ANCOVA

Closed this issue · 11 comments

Dear Todd,

I have data in which I would like to correct for gait speed. Is this currently possible with SPM using an ANCOVA or using another method?

Thank you for your answer!

Mariska

Dear Mariska,

Thank you for the question. ANCOVA is not yet directly implemented in spm1d with a high-level interface, but it will be in a future version of the software. At the moment relatively simple ANCOVA can be conducted using the general linear model interface (spm1d.stats.glm):
http://www.spm1d.org/doc/Stats1D/onetwosample.html#general-linear-model-glm
The key code is from the example above (with MATLAB syntax) is:

>> X          = zeros(nCurves, nFactors);
>> X(:,1)     = x;       %regressor (continuous variable)
>> X(:,2)     = 1;       %intercept
>> X(:,3)     = linspace(0,1, nCurves);   %linear drift
>> X(:,4)     = sin( linspace(0, pi, nCurves) );  %sinusoidal drift

Here the design matrix X contains the two typical regression factors (a regressor and an intercept) and also two nuisance factors (linear and sinusoidal drift). The effects that can uniquely be attributable to the regressor are given by the contrast vector:

>> c          = [1 0 0 0]';  %only the first column is of empirical interest

Hypothesis testing can be conducted by submitting both X and c to general linear model analysis:

>> t  = spm1d.stats.glm(Y, X, c);
>> ti = t.inference(0.05, 'two_tailed', false);
>> ti.plot()

This approach can work quite flexibly for t statistic computations in ANCOVA designs, but F statistic computations (i.e. for multi-group and/or multi-factor designs) are a bit trickier to implement with the current spm1d.stats interfaces to ANOVA models.

Will the spm1d.stats.glm be sufficient for you for the moment?
If you prefer F computations I could post some code here which demonstrates some full ANCOVA possibilities.

Todd

Dear Todd,

Thank you very much for your extensive reply.
However, I have data of multiple test occasions, therefore I will need to use the F statistic computations. Would it be possible to show an example of how to implement the ANCOVA for such a design?

Mariska

Hi Mariska,
Sure, no problem.
Please give me a few days to put something together, then I'll post the code here.
Todd

Hi Mariska,

I've got some code working for simple ANCOVA, but I haven't been able to test it thoroughly (e.g. arbitrary numbers of groups and arbitrary numbers of factors) because there don't appear to be many good example datasets available on the web. It will probably take me about 20 more hours of development to get things tested properly, so I'll need to postpone release of the ANCOVA code until a later time, likely with the next major release of the software.

In the meantime, I suggest using the approach discussed above for t tests; when conducting ANOVA it is usually the post hoc t comparisons that are of more interested than the F results, so I think it would be fine to skip directly to post hoc tests while ANCOVA functionality is being developed.

Todd

P.S. I've added ANCOVA to spm1d's list of features in development.

Dear Todd,

Thank you very much for all the trouble to try to get the ANCOVA working on such short notice. I will for now try the approach for t tests and I look forward to the next release!

Mariska

Hi Todd,
I wanted to follow up to see if the ANCOVA code has been released? Similar to Mariska, I am interested in controlling for self-selected running speed when comparing adolescents of different groups (example: strong vs weak). Is it possible to control for each participant's unique self-selected speed?
Thanks for your help!
Micah

Hi Micah,
Yes, this should be possible using spm1d.stats.glm as indicated in the second posting above. Does this solve the problem?
Todd

Thanks for the quick reply Todd. I have a couple more questions for clarification for the code provided above:

X = zeros(nCurves, nFactors);
X(:,1) = x; %regressor (continuous variable)
X(:,2) = 1; %intercept
X(:,3) = linspace(0,1, nCurves); %linear drift
X(:,4) = sin( linspace(0, pi, nCurves) ); %sinusoidal drift

c = [1 0 0 0]'; %only the first column is of empirical interest

t = spm1d.stats.glm(Y, X, c);
ti = t.inference(0.05, 'two_tailed', false);
ti.plot()

Does nCurves = number of participants for each group or number of groups being compared?
Does nFactors = number of covariates?
Does x = the covariate value?
Does Y = a matrix of the the dependent variable with each row a different participant?

Does nCurves = number of participants for each group or number of groups being compared?

Possibly, but more accurately: nCurves is the total number of observations. The observations are assembled into a (nCurves, nTimePoints) data array Y. The design matrix X must have one row for each row of the data array.

Does nFactors = number of covariates?

No, nFactors is the number of modeled factors. For simple linear regression (y = ax + b) there are two factors: an intercept b and a slope a. The example above adds two more factors. Usually a "covariate" is a non-intercept factor that is not directly of interest. In the model above x is the variable of interest, and there is one intercept and two covariates, thus totaling four modeled factors.

Does x = the covariate value?

In this example x represents the independent variable of empirical interest. The nuisance factors (i.e., covariates of no empirical interest) are in the final two columns: linspace(0,1, nCurves) and sin( linspace(0, pi, nCurves) ).

Does Y = a matrix of the the dependent variable with each row a different participant?

Yes, but there may be more than one row for each participant depending on the design.



In case you've not yet tried running the example script, please try it out:

./spm1dmatlab/examples/stats1d/ex1d_glm.m

By playing around with the inputs (e.g. adding or subtracting columns from X) you should get a sense of different ways X can be used to model the data.

The following related discussions may also be helpful:

These examples help tremendously, thanks Todd!
Hopefully one last question: if I wanted to control for a categorical group, I would add another factor to X for the group value? And then specify that column for the c variable?

X = zeros(nCurves, nFactors);
X(:,1) = x(:,1); %independent variable (running speed)
X(:,2) = x(:,2); %categorical variable (group [male=1, female =2])
X(:,3) = 1; %intercept
X(:,4) = linspace(0, 1, nCurves);
X(:,5) = sin( linspace(0, pi, nCurves) );

c = [1 1 0 0 0]'; %includes running speed and group in analysis

Categorical variables should be treated as binary factors like this:

X = zeros(nCurves, nFactors);
X(:,1) = x(:,1); %independent variable (running speed)
X(:,2) = x(:,2); %categorical variable (group [male=1, otherwise =0])
X(:,3) = x(:,3); %categorical variable (group [female=1, otherwise =0])
X(:,4) = 1; %intercept
X(:,5) = linspace(0, 1, nCurves);
X(:,6) = sin( linspace(0, pi, nCurves) );

The contrast vector should look something like this:

c = [0 -1 1 0 0 0]'; %mean male-female difference after accounting for all other factors