General Linear Model for a multiple regression analysis

Question

General Linear Model for a multiple regression analysis

Closed this issue 8 years ago · 8 comments

Dear SPM1s experts,

I have a question regarding a multiple regression analysis with the SPM1s toolbox. I have kinematic data of a clinical population and I would like to study to which extent two different factors play a role in the movement patterns of that population.
I thought about doing that by using the GLM implemented in the toolbox. After loading my data, I specified 5 factors (2 independant variables, the intercept and the two nuisance factors), and it looks like this:

--
% specify design matrix:
nCurves = 60;
nFactors = 5;
X = zeros(nCurves, nFactors);
X(:,1) = var1;
X(:,2) = var2;
X(:,3) = 1;
X(:,4) = linspace(0, 1, nCurves);
X(:,5) = sin( linspace(0, pi, nCurves) );

% specify contrast vector:
c = [1 1 0 0 0]';

%(1) Conduct SPM analysis:
spm = spm1d.stats.glm(data, X, c);
spmi = spm.inference(0.05, 'two_tailed', false);

I have some questions regarding the code:

I don't entirely understand what the factor 4 and 5 would depend on. Does it make a difference if I add them to the model or not? Does it always have to be those lines of code beacuse of a specific reason?
In my contrast, I want to take into account both variables (var1 and var2), but I do not want to compare them... is that possible with the toolbox? Should I add something to the code? Or just get the values of the betas?
The two variables (var1 and var2) are intercorrelated, so based on the SPM literature for neurimaging, I should orthogonalize one variable with respect to the other, right? If I do not orthogonalize one of the variables, I do not obtain any significant results, but if I orthogonalize it (which is the same as writing [1 -1] instead of [1 1] in my contrast), I obtain one cluster (I attached a figure of the results).
The last questions is how to interpret these results. Can I know with this approach what is the variability explained by each of the variables to the movement pattern? Should I report also the beta values that spm

I would appreciate any kind of input to this, because I guess I am missing some things.

Thank you very much in advance!
Regards,
Cristina

Answer 1 · 2017-01-11T00:42:31.000Z

Hi Cristina,

I'll respond below...

I don't entirely understand what the factor 4 and 5 would depend on. Does it make a difference if I add them to the model or not? Does it always have to be those lines of code beacuse of a specific reason?

If the data are uncorrelated with the nuisance factors (factors 4 and 5) then removing them will have little effect on the results. They only make a difference if they are actually correlated with your data. The code above models linear drift (factor 4) and sinusoidal drift (factor 5). Linear drift can occasionally occur in measurements, due for example to electronic baseline drift in the measurement equipment. Sinusoidal drift can occur in long experiments, for example, where performance can be affected by things like time-of-day, hunger, body temperature, etc. But the nuisance variables can be anything you want. If you have no reason to model linear and/or sinusoidal drift then it might be best to remove them from the model.

In my contrast, I want to take into account both variables (var1 and var2), but I do not want to compare them... is that possible with the toolbox? Should I add something to the code? Or just get the values of the betas?

Yes, just create two different contrast vectors:

c1 = [1 0 0 0 0]';
c2 = [0 1 0 0 0]';

and then compute two separate SPMs.

The two variables (var1 and var2) are intercorrelated, so based on the SPM literature for neurimaging, I should orthogonalize one variable with respect to the other, right? If I do not orthogonalize one of the variables, I do not obtain any significant results, but if I orthogonalize it (which is the same as writing [1 -1] instead of [1 1] in my contrast), I obtain one cluster (I attached a figure of the results).

The GLM procedure takes account of correlations not only between var1 and var2 but amongst all modeled variables (in this case: five variables). Specifying the contrasts as above should give you what you want. The contrast "[1 -1 0 0 0]" represents a two-sample t test with nuisance factors, but only if var1 and var2 are binary, representing two groups, so I don't think this is what you want. The contrast "[1 1 0 0 0]" is an ANOVA-like contrast (representing simultaneous effects in var1 and var2) so probably shouldn't be used with GLM, which is currently designed only to compute t contrasts.

The last questions is how to interpret these results. Can I know with this approach what is the variability explained by each of the variables to the movement pattern? Should I report also the beta values that spm

For the "c1" contrast above the interpretation is: correlation with var1 after accounting for linear correlations associated with all other factors. The interpretation is similar for "c2". The beta trajectory for "c1" will probably be quite similar to the corresponding SPM{t}, so it shouldn't be necessary to report beta. The main value of beta is in multi-level modeling: level 1: apply the identical model to multiple subjects (but with different nuisance factor values, for example), level 2: use the multiple beta trajectories in a second-level analysis (usually a one-sample t test on the betas).

Todd

Answer 2 · 2017-01-11T23:36:49.000Z

Hi Todd,

Thanks a lot for the quick response.
With regard to building the contrasts, I have done it as you suggested and I have got the image that is attached. How can I interpret what is the explained variability of each variable to the kinematic data in the regression model? I understand this as a typical multiple regression for scalars, where we get the values of explained variability of each variable, but I don't know how to interpret this in SPM (or if I'm missing something). So far I understand that I get a graph and an inference per contrast (variable), but I would like to have both in the model, since my hypothesis is that both variables define the movement pattern.
Maybe I should not use GLM...?

Thanks in advance.
Regards

Cristina

Answer 3 · 2017-01-12T05:47:54.000Z

Hi Cristina,

Both var1 and var2 are indeed included in the model, but testing the the overall model fit requires an F contrast, and the current version of spm1d.stats.glm is set up only for t contrasts. At the moment you can only use spm1d.stats.glm to conduct post hoc analysis for multiple regression, like you have done above.

One way to roughly infer relative variability contribution from these post hoc analyses is to compare the t statistic magnitudes. From the figure it would appear that var2 is correlated much more strongly with the dependent variable than is var1, and probably also that var2 is more strongly correlated than var1 at all points in time. To check that you can try plotting absolute t values on a single graph and/or plotting their absolute ratio: x = abs(t_var2 / t_var1).

Another indirect way of assessing goodness-of-fit is to additionally run simple linear regression, once for var1 and once for var2. The SPM{t} shapes will likely be slightly different, so you will be able to roughly interpret how much better the GLM fit is.

For purposes of scientific reporting I think it would be fine to report the two results above, potentially along with simple regression results, and to not report an overall model fit (F statistic) result; since the var2 effect is clearly quite strong, the overall model fit will certainly reach significance.

Although not immediately helpful, please find that I've added "multiple regression" to the list of features to be developed for future releases:
0todd0000/spm1d#45

Todd

Answer 4 · 2017-01-12T16:59:52.000Z

Hi Todd,

Thanks a lot again for the super quick answer and for your help! Now everything looks much more clear :-)
Thanks a lot also for including the multiple regression in the list of features to be developed. I think it will be very useful!

I think that maybe focusing on var2 makes more sense, since it correlates indeed much more strongly than var1. This way I can also use the GLM as it is computed now (until the next version is released).
I have a couple of questions regarding the model, I was thinking about the factors that you set as defaults to control for the drifts, and as far as I am concerned, they are more related to sensors, right? We are using vicon cameras to record our 3D kinematic data, so I think they are not necessary. I would like to control for age and movement speed, though. If I understood correctly I can also add those continuous variables to the GLM instead of the drifts, right? If I do so, my contrast would be [1 0 0 0], where the first is var2, the second is the intercept, and the fourth and fifth age and speed, respectively.
Could you please confirm that that is correct?

A side comment, if I calculate the absolute ratio as you suggested, I obtain that var2 is 2.89 times more correlated with the data than var1, which makes sense. Thanks for the suggestion!

Thanks a lot in advance!
Regards,
Cristina

Answer 5 · 2017-01-12T22:56:53.000Z

Hi Cristina,

Yes, the linear and sinusoidal nuisance factors probably relate most closely to sensor drift, and may not be directly relevant to motion capture systems. However, it is feasible that motion capture systems are affected by laboratory factors like room temperature, and ambient light conditions, and these may vary linearly over the course of a few hours, and sinusoidally over the course of a day. Laboratory factors like distance-to-subject, and subject factors like movement velocity or movement plane might also affect measurements. I expect that modern motion capture systems are not affected by any of these factors, but if you are concerned about any of these factors it may be worth including them in a GLM.

As you suggest adding subject nuisance factors like age and speed is fine. And yes, contrast vector elements corresponding to the nuisance factors should be zeros. However, note that the GLM allows you to test the nuisance factors directly, for example with contrast vectors: [0 0 0 1 0] and [0 0 0 0 1]. This shouldn't be done for the main analyses because nuisance factors are, by definition, not of empirical interest, but as supplementary information those analyses might be worthwhile. In particular, if there is in fact a correlation with a nuisance factor, it would suggest that the experiment was probably not optimally designed for focus on the factors of empirical interest.

Cheers,
Todd

Answer 6 · 2017-01-18T16:18:43.000Z

Thanks a lot Todd for the responses and for the advices.

While doing the last part of the analysis, I came up with another question.
I have 8 conditions and 13 angles per condition in my data. I wanted to ask you if you could share your thoughts regarding the best correction to implement.
I think that reviewers may be concerned if I use always 0.05 as alpha threshold, and I was wondering whether there is a better way of correcting using SPM. I thought about just being more strict and choose 0.01 as alpha threshold, but one can also argue to use an alpha that comes from dividing 0.05 by the number of conditions or angles. Note that I dont compare conditions, since they are actually performing a different task. I compare actually group of healthy individuals with a clinical population.
Is there something that I have to take into account for the SPM anlaysis?

Thanks a lot in advance!
Cheers,,
Cristina

Answer 7 · 2017-01-18T22:39:16.000Z

Hi Cristina,

In spm1d there are currently just two options: no correction, or a Bonferroni correction across tests, which is very nearly the same as the correction you suggest. A cross-test Bonferroni correction is the most conservative correction possible, so if your results exceed that correction I don't think any reviewer would challenge the "effect" result. The converse is true for not correcting across tests: that approach is rather liberal, so if your results fail to exceed that threshold I don't think a reviewer would challenge the "lack of effect" result.

Interpretation problems arise only if a result exceeds the normal RFT threshold but not a cross-test Bonferroni-corrected RFT threshold. In this case the data suggest that there may be a real effect, but it is too weak to cross a conservative threshold. In this case I think the results should be interpreted cautiously using careful wording like:

"There appeared to be an effect of X on Y, and while this effect was strong enough to exceed the RFT threshold at alpha=0.05, it did not exceed a cross-test Bonferonni-corrected critical p value of 0.0XX. We therefore conclude only moderate evidence for this effect."

Note that this discussion regarding correction choice could be applied almost identically to one's alpha choice. The literature has adopted a convention of alpha=0.05, but does that mean that result of p=0.06 represents no scientific evidence and that a result of p=0.04 represents strong scientific evidence? Of course not. I personally feel that all apparent effects should be interpreted cautiously, regardless of how clear they may appear to be, because slightly different results are certain to appear if the experiment is re-run using new subjects.

As long as you can convince a reviewer that your results are not sensitive to your data processing decisions then I think they'll be happy.

Todd

Answer 8 · 2017-02-03T23:02:18.000Z

Hi again,
I'm closing this issue for now, but if anything additional comes up please feel free to re-open this issue or create a new one.
Thanks!
Todd

%(1) Conduct SPM analysis: spm = spm1d.stats.glm(data, X, c); spmi = spm.inference(0.05, 'two_tailed', false);

%(1) Conduct SPM analysis:
spm = spm1d.stats.glm(data, X, c);
spmi = spm.inference(0.05, 'two_tailed', false);