General linear model

Question

General linear model

Closed this issue 8 years ago · 6 comments

Hey all,
I want to use the general linear model function and have multiple independent variables. When I checked the example which is attached to the glm.m there is only one independent variable. I modified x to have 6 variables and tried to change to code, but got errors.

From the example I do not understand what nFactors is, as there are only 3 walking speeds...
It would be really helpful if you could give me a short explanation to these code lines
nFactors = 4;
X = zeros(nCurves, nFactors);
X(:,1) = x;
X(:,2) = 1;
X(:,3) = linspace(0, 1, nCurves);
X(:,4) = sin( linspace(0, pi, nCurves) );

Do I have to create a 3 dimensional matrix, one of the X matrix for each independent variable?
Kind regards,

Sina

Answer 1 · 2016-06-09T08:39:24.000Z

Hi Sina,

Thanks for your question. Roughly speaking nFactors is the number of experimental factors. More precisely it is it the number of continuous experimental factors plus the total number of levels of categorical factors. This might be easier to understand by example:

Two-sample t test. Imagine five observations for each of two groups. The independent variable is GROUP and there are two categorical levels of GROUP: Group1 and Group2. The design matrix is indicated below. There are two columns, one for each categorical level.
Simple linear regression. Imagine that five values of the independent variable (e.g. body mass) are: 70, 65, 80, 79, and 73, respectively. The design matrix is indicated below. There are two columns, one for the continuous independent variable (body mass) and one for the intercept.
Cited example. There are four columns in the design matrix. The first two represent simple linear regression just like example 2 above. The third and fourth columns represent continuous nuisance factors: a linear one and a sinusoidal one.

Regarding the error you mention, if the discussion above doesn't solve the problem, then please send some more details about the error. The following would be good:

Your MATLAB script up to and including the line which generates the error.
The full error message (copied and pasted)
If possible, a description of the variables: (i) continuous or categorical, (ii) number of levels for each categorical variables, and (iii) whether or not is its a nuisance variable

Cheers,

Todd

Answer 2 · 2016-06-09T11:44:05.000Z

Hey Todd!
Thanks for this quick answer. With your advise I made it work.
Maybe you can just help me with the interpretation.
My dataset is like this:
independent parameters for 17 subjects: one that is categorical (0 and 1 = 2 levels) and one that is continuous, I'm not sure how to answer the question about having a nuisance variable, but it sounds important... My dependent variable is a set of forces normalized to 101 data points for 17 subjects.

I changed the code into (please correct me if it is nonsense):

nCurves = numel(x(:,1));
nFactors = 5;
X = zeros(nCurves, nFactors);
X(:,1) = x(:,1); %categorial variable
X(:,2) = x(:,2); %contuous variable
X(:,3) = 1; % This just tells the model that there is a linear correlation, right?
X(:,4) = linspace(0, 1, nCurves);
X(:,5) = sin( linspace(0, pi, nCurves) );
% specify contrast vector:
c = [1 1 0 0 0]'; % taking into account the first and second variable

I attached the results figure
If I'm right, I will have to do some sort oft post hoc testing. Is the way I modified the code correct?

glm.pdf

Thank you!

Answer 3 · 2016-06-09T22:53:10.000Z

Hi Sina,

A nuisance factor is a factor that might affect your dependent variable(s) but which is not of explicit empirical interest. A common example is linear drift: electronic sensor measurements can sometimes drift over time. These factors can be included in the model, but effects associated with them are not tested directly.

Without knowing the content of x is is difficult to know whether these two lines correctly implement categorical and continuous factors:

X(:,1) = x(:,1); %categorial variable
X(:,2) = x(:,2); %contuous variable

Regarding the line:

X(:,3) = 1; % This just tells the model that there is a linear correlation, right?

That is actually just an intercept. It can be included in the model but may be redundant if you use one or more categorical variables.

As above, without knowing the content of x it is difficult to know whether or not the contrast vector is correct:

c = [1 1 0 0 0]'; % taking into account the first and second variable

If the second column of X is a continuous variable then you probably want the following contrast vector:

c = [0 1 0 0 0]';

Please look at the code for all t related tests including:

spm1d.stats.ttest
spm1d.stats.ttest2
spm1d.stats.ttest_paired
spm1d.stats.regress

All of these tests use spm1d.stats.glm and may clarify how to set up the design matrix and contrast vector. You may also be interested in reading some reference materials regarding linear modeling. Two great starting places are Friston et al. (1995) and the SPM document repositiory:

Friston KJ, Holmes AP, Worsley KJ, Poline JB, Frith CD, Frackowiak RSJ (1995). Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping 2, 189–210.
SPM documentation repository, Wellcome Trust Centre for Neuroimaging. http://www.fil.ion.ucl.ac.uk/spm/doc/

Note especially: if spm1d.stats.glm does not generate errors, it does not mean that the results are correct. Please use this function with caution.

Cheers,

Todd

Answer 4 · 2016-06-10T07:54:59.000Z

Hey Todd,
I compared my results with a simple linear regression for each of my parameters, which made me quite confident that the glm result may make sense.
But I will go through the papers you suggested and come back if there is more questions!

Thank you for your detailed answers!

Sina

Answer 5 · 2020-03-20T03:17:08.000Z

Hi Todd,

I have been trying to run a glm comparing gait data between 2 independent groups (Case/control) using gait speed as a covariate. I have tried to follow your example code and your post to Sina here.

After running this and checking the results, it struck me that I wasn't putting group into the model. I've tried a few things to correct this but what seemed logical to me didn't change my results or it did but I'm not confident that I was doing the right thing and I wanted to check.

This is my definition of the design matrix (matlab):

X                = zeros(nCurves, nFactors);
X(:,1)         = x;  %regresor (gait speed)
X(1:nA,2)  = 1; %group 1
X(nA+1:end,3) = 1; %group 2
X(:,4)         = linspace(0, 1, nCurves); % linear noise
X(:,5)         = sin( linspace(0, pi, nCurves)  );

Your notes to Sina and on your website say please use stats.glm with caution. Is what I have above for the design matrix correct and can you expand on what you mean by 'caution'? is that another way of saying you need to know what you're doing or are there checks that we should run on the results beyond what we might do for a ANOVA on discrete data?

Thanks

Trevor

Answer 6 · 2020-03-23T06:35:57.000Z

The design matrix looks fine, but may need to be tweaked a bit to more closely represent the experiment. Consider the following.

The linear and sinusoidal noise terms are usually only appropriate if the observations were collected at equally spaced time intervals; if this was not the case, then these columns should probably be deleted.
Ensure that the contrast vector represents the difference between group means.

So I'd suggest:

X             = zeros(nCurves, nFactors);
X(:,1)        = x; %regresor (gait speed)
X(1:nA,2)     = 1; %group 1
X(nA+1:end,3) = 1; %group 2

c             = [0 -1 1];   % difference between group means

Yes, your interpretation of "caution" is correct. The glm function is highly flexible, so users must be confident that their model(s) and contrast(s) are correct when using this function. spm1d doesn't offer any tools to check the implementation, so you might want to compare spm1d's glm results to those from a third party package (e.g. R, SPSS) to ensure that the model has been implemented correctly.