Verifying results of SPM tests

Question

Verifying results of SPM tests

Closed this issue a year ago · 24 comments

Hello,

Hope you are doing well. Sorry, I would like to verify whether my understanding of my results are correct or if I am interpreting them wrong.

Dataset : I am using a gait dataset where different magntidues of soft tissue artefact (STA) models are added to each marker, resulting in 500 gait cycles + 1 reference gait cycle. I have calculated residual error (501x113) and joint angles (5x113) where 113 is the number of frames and 5 joint angles come from different joints.

Research question : I want to verify if there is a relationship between residual error and joint angle error. I have tried to determine this using scatter plots and SOM.

Results:

When you plot amount of STA added to residual error, you get a linear graph.

But scatter plots of STA vs joint angle error or residual error vs joint angle error are inconclusive.

STA vs joint angle error:

Residual error vs joint angle error

Parametric and non-parametric ttest2 analysis between joint angle error and residual errors between 2 groups of 250 each (grouped based on amount of STA added : lower 250 vs upper 250) show some significant differences. Both parametric and non-parametric are qualitatively similar.

Question: Since the scatter plots show no trend but SPM shows some signficant differences, can this be attributed to the fact that in scatter plots, we are reducing the entire time series to a single point? Or does it not make sense to compare these results and that SPM indicates something completely different?

Thanks
Vignesh

Answer 1 · 2023-10-03T01:29:40.000Z

I don't quite understand the connection between the variables in (1) and (2) and those in (3). Can you please attach some code or psuedo-code that shows how these different variables are calculated? Please also indicate which variables are submitted to spm1d.stats.ttest2.

Answer 2 · 2023-10-03T04:14:26.000Z

Hi Todd,

Sorry for the confusion. The variables in 1 and 2 are mean values of the STA added to markers, the mean values of the residual error and the mean values of joint angle errors.

While the graphs are not explicity linked: the SPM is comparing ankle flexion angle error between the bottom 250 subjects and top 250 subjects. The bottom and top 250 subjects were determined by sorting STA added to markers.

My question is that: when you see figure 2a (STA vs joint angle error), there does not seem to be any correlation between amount of STA noise added and joint angle error. My take away was that joint angle error can be high for both, a low amount of STA and a high amount of STA added.

But when we do an SPM test on joint angle errors between two groups, groups determined based on STA added, there seems to be some significant difference. My question is, would this not be reflected on the scatter plot (2a (STA vs joint angle error) or is it not reflected because the time series data has been condensed to a single point which does not capture the differences?

Or have I completely misunderstood and I am trying to compare and contrast two unrelated things?

Thanks

%%% For STA calculation %%%%%%%%
deviations = rmse difference between reference marker trajectory and 500 marker trajectories
Residual errors and joint angles are calculated using openSim. Joint angle errors are the rmse difference between joint angles acquired using the reference trajectory and those acquired using thr 500 trajectories.

SPM is between ankle angle errors of bottom 250 subjects and top 250 subjects. Subjects were grouped based on amount of STA added:
figure()
subplot(2,1,1)
t = spm1d.stats.ttest2(ankleFlexionErrors_bottom250', ankleFlexionErrors_top250')
ti = t.inference(0.05, 'two_tailed', true, 'interp', true)
ti.plot()
ti.plot_threshold_label();
ti.plot_p_values();
title("Parametric test")
subplot(2,1,2)
alpha = 0.05;
two_tailed = true;
iterations = 1000;
snpm = spm1d.stats.nonparam.ttest2(ankleFlexionErrors_bottom250', ankleFlexionErrors_top250');
snpmi = snpm.inference(alpha, 'two_tailed', two_tailed, 'iterations', iterations);
title('Non-Parametric results')
disp( snpmi )
snpmi.plot(); snpmi.plot_threshold_label(); snpmi.plot_p_values();
sgtitle("Top 250 vs bottom 250 ankle flexion angle error")

Answer 3 · 2023-10-03T11:20:20.000Z

would this not be reflected on the scatter plot (2a (STA vs joint angle error) or is it not reflected because the time series data has been condensed to a single point which does not capture the differences?

This is a difficult question to answer because it involves several issues:

0D vs. 1D analysis
frame-specific analysis vs. all-frames-pooled analysis
regression vs. t tests

So your interpretation may indeed be correct but it is not possible to know without further analyses that probe each of these issues. Here are a few suggestions:

To probe the issue of (a) frame-specific vs. (b) all-frames pooled analysis, try extracting only the data at the instant of the maximum absolute t-value (around frame 80). Then create a scatter plot of these single-frame (0D) data. You will likely see greater qualitative agreement between the SPM result and the scatterplot.
To probe the issue of 0D vs. 1D analysis, repeat the previous single-frame suggestion for several frames with different absolute t-values (e.g. Frames 25, 80, 110). You will likely see good qualitative agreement with the correlation strengths (and directions) and the SPM results.
To probe the issue of regression vs. t tests, try conducting SPM regression using spm1d.stats.regress.

Answer 4 · 2023-10-03T13:57:37.000Z

Hi Todd,

Thanks for your reply. I have a couple of follow up questions and comments:

"To probe the issue of (a) frame-specific vs. (b) all-frames pooled analysis, try extracting only the data at the instant of the maximum absolute t-value (around frame 80). Then create a scatter plot of these single-frame (0D) data. You will likely see greater qualitative agreement between the SPM result and the scatterplot."

This worked out quite well. I have attached the scatter plot below which shows a relationship which was not observed in the previous scatter plots. Does the good agreement with SPM underscore the issue of losing information while using a single point for time series data?

Thanks again. The results did match qualitatively for most of the plots though plotting directions in scatter plots took a while to compare to SPM.
I have attached the regression plot for ankle angle errors vs STA added to markers. I had to use the average STA added to markers in the regression plot, since I got an error is I used a time series data for the x variable. I am not sure how to interpret the results. Does the graph mean that between time frames 30-90, the ankle angle errors have a relationship with average STA added to markers and a positive relationship (increasing STA to increasing ankle angle errors)?

Answer 5 · 2023-10-03T18:46:35.000Z

Hi,

Sorry. I have a follow up question to the previous one:

I would like to analyse the variation of residual error due to external parameters : marker misplacement, scaling and joint types. I have created three models for each of those parameters and have 500x113 residual errors for each model.
I think the 3-way anova test would be the best, but I am not exactly sure how to interpret the results? would it be better to use a nested 3-way anova?
Is it possible to determine which parameter affects the residual error the most using 3-way anova?
Would I need to check on just one subject or on all 50 subjects for the best result?

I apologise for asking so many questions and I am really grateful to the support.

Answer 6 · 2023-10-04T01:54:18.000Z

Does the good agreement with SPM underscore the issue of losing information while using a single point for time series data?

Yes. Pooling frames generally distorts frame-specific effects.

Does the graph mean that between time frames 30-90, the ankle angle errors have a relationship with average STA added to markers and a positive relationship (increasing STA to increasing ankle angle errors)

Yes this appears to be an accurate interpretation. A general interpretation that does not depend on the nature of the variables is: significant positive correlations between the 0D independent variable and the 1D dependent variable over approximately frames 30-90.

I would like to analyse the variation of residual error due to external parameters : marker misplacement, scaling and joint types. I have created three models for each of those parameters and have 500x113 residual errors for each model.
I think the 3-way anova test would be the best, but I am not exactly sure how to interpret the results? would it be better to use a nested 3-way anova?

Perhaps one-way repeated-measures ANOVA (spm1d.stats.anova1rm) would be sufficient?

Is it possible to determine which parameter affects the residual error the most using 3-way anova?
Would I need to check on just one subject or on all 50 subjects for the best result?

These sound like scientific questions that depend on the nature of your study. Since these questions are not specific to spm1d they cannot be supported in this forum.

Answer 7 · 2023-10-04T20:07:19.000Z

Hello Todd,

Thank you so much for your support. I would like to clarify three points:

For the regression analysis : The 0D independent variable is made by taking the mean of a time-series data. Would the above explanation still hold for that? In the current example, the regression analysis is between the ankle angle errors (time series) and the mean of STA added throughout the trial for each of the participants (0D data). The reason for the confusion is that we are inferring at what frames are the effects significant in terms of regression? But since the independent variable is also a time-series variable, does the interpretation hold for the entire independent time series data?
I would like to confirm which would fit best for my analysis : anova with repeated measures or without. Based on the other discussions in the forum, repeated measures is used when the same subjects are tested with different conditions. In my case: I am comparing the residual error acquired using different models but fed with the same data. The different models are created by modifying certain parameters. Since the same input data is used for the models, would it mean I use repeated measures or do I use without repeated measures since each model is inherently different, so the calculated residual error can be approximated from coming from different subjects?
I ran a parameteric and non-parameteric 3-way anova with repeated measures. Does the attached graph indicate a signficant affect of the independent parameters on the entire dependent variable for the entire gait cycle? Is that the correct interpretation?

Thanks a lot and sorry for the inconvenience

Answer 8 · 2023-10-05T08:54:24.000Z

For the regression analysis : The 0D independent variable is made by taking the mean of a time-series data. Would the above explanation still hold for that?

spm1d currently supports only regression between (i) a 0D univariate independent variable, and (ii) either a 1D univariate or multivariate dependent variable. Using a 1D independent variable would be a type of cross-correlation analysis but that is not yet supported in spm1d.

I would like to confirm which would fit best for my analysis : anova with repeated measures or without.

You previously wrote "...three models for each of those parameters and have 500x113 residual errors for each model." In this case each of the 500 observations from one model can be paired with the 500 observations from each of the other models. It is therefore a repeated measures design with 500 "subjects".

I ran a parameteric and non-parameteric 3-way anova with repeated measures. Does the attached graph indicate a signficant affect of the independent parameters on the entire dependent variable for the entire gait cycle?

Yes, if the maximum F-value exceeds the critical threshold then you can conclude that the main factor (in this case: Model) has a significant effect on the DV. Here the maximum F-value is about 700 which is well beyond the critical threshold of 4.56. Often upcrossing extent (i.e., suprathreshold cluster width) is also reported and in this case it does indeed appear that there is just one upcrossing that spans the entire 1D domain.

Answer 9 · 2023-10-05T20:49:47.000Z

Thanks a lot for your help. I shall close this issue.

Answer 10 · 2023-10-17T21:42:53.000Z

Hi Todd,

Hope you are well. Sorry to disturb but I would like to clarify three questions.

I have attached a plot of regression analysis of 5 joint angle errors against residual errors. While I understand the shaded part implies statistical significance, does the graph being above 0 of SPM {t} indicate anything? Also, for situations where a very small part of the graph is statistically significant, is it just chance or does it actually imply that there is a relationship at those few frames (for example knee flexion angle errors)?

I have attached two graphs. In each graph, the figures on the left show joint angle trajectories acquired using different models and on the right a paired t-test between those joint curves. I performed a paired t-test since the input data is the same, just the underlying model changes (previous discussion). In the first graph, despite qualitatively huge variation between the baseline model and 6 DoF model hip flexion curves, the t-test shows that the Ball model has a higher difference compared to the baseline model. Is there anything wrong in the result or the interpretation?

Similarly in the second figure, despite all 4 curves looking similar, the test states that there are significant differences. I would like to know if I am doing something wrong.

Thanks,
Vignesh

Answer 11 · 2023-10-18T02:21:11.000Z

does the graph being above 0 of SPM {t} indicate anything?

These are t-values. They indicate effect magnitude (qualitatively similar to effect size).

Also, for situations where a very small part of the graph is statistically significant, is it just chance or does it actually imply that there is a relationship at those few frames (for example knee flexion angle errors)?

It is not possible to conclude whether small areas --- or large areas for that matter --- are caused by chance. These tests consider the the null hypothesis (H0) of equal means or equivalently: null effect. If H0 were true, then the observed clusters would be produced randomly with the indicated probability. In other words, p-values pertain the infinite set of all experiments where H0 is true. P-values do not pertain directly to the observed results in a single experiment.

despite qualitatively huge variation between the baseline model and 6 DoF model hip flexion curves, the t-test shows that the Ball model has a higher difference compared to the baseline model. Is there anything wrong in the result or the interpretation?

Try plotting pairwise differences. In a paired t-test only paired differences are tested, so the original data may be misleading. If you plot paired differences you will likely see patterns that are more similar to the SPM results.

Answer 12 · 2023-10-18T08:40:09.000Z

Hi Todd,

Thanks for your reply. Sorry I did not completely understand the last point. I have attached pairwise differences between the baseline model and Ball model and baseline model and 6 DoF model. The magntiudes seem more in the latter but the SPM plots (top right and middle right) seem more significant for Ball and baseline model. Or did you mean a different t-test?

Thanks

Answer 13 · 2023-10-18T13:30:34.000Z

Look at the distribution of data at around Frame 45. The mean moves relatively far from the zero line at this point. To see this more clearly try superimposing the mean over all of the other curves. The t-value will roughly follow this mean.

Answer 14 · 2023-10-18T13:47:43.000Z

Hi Todd,

Just to confirm whether I am understanding correctly. The statistic test shows significance because of how the mean moves relatively from 0 and not the magnitude of variation itself. For example, in the graph below, despite the variation (pairwise differences) between baseline and ball model is of the magntiude -10 to 10 and that between baseline and 6 DoF is between -60 to 60, because the mean for the latter is around 0, there is less significance compared to the first case (as it moves further away from 0).

So would the interpretation of the result be that the 6 DoF model and baseline model have similar joint angles while baseline and Ball model have significantly different joint angles?

Thanks a lot

Answer 15 · 2023-10-18T14:07:27.000Z

The ranges -10 to 10 and -60 to 60 do not matter. The only quantity that matters is the t-statistic, which is roughy the ratio: mean / variance. Please try plotting the mean with SD clouds, for example using spm1d.plot.plot_meanSD. The results will be similar to the paired test results.

So would the interpretation of the result be that the 6 DoF model and baseline model have similar joint angles while baseline and Ball model have significantly different joint angles?

The null hypothesis is rejected in both of the results above. The interpretation is that there are statistically significant differences in both sets of results.

Answer 16 · 2023-10-24T22:24:12.000Z

Hi Todd,

Sorry for disturbing again and thanks for the support. I need to clarify a point regarding post-hoc analyses. I have done a one-way rm ANOVA with 4 different models (variable index = 0,1,2,3 and each has 501 subjects. I obtained significant effect so performed a post-hoc analysis using paired ttest and a corrected factor with nTests = 4. I have attached the graph below.

My question: I receive the same p-value for all ttests (middle three graphs). Is this normal? Or am I doing something wrong? I have attached part of the code below

totalResidualError = [totalResidualError_RMS_Walker';totalResidualError_RMS_Walker_markerPertubations1';totalResidualError_RMS_Walker_markerPertubations1_005';totalResidualError_RMS_Walker_markerPertubations1_0125'];
index = [zeros(501,1);ones(501,1);2.*ones(501,1);3.*ones(501,1)];
subject = 1:501;
SUBJ = [subject';subject';subject';subject'];
totalResidualErrorBase = median(totalResidualError_RMS_Walker);
totalResidualErrorModel1 = median(totalResidualError_RMS_Walker_markerPertubations1);
totalResidualErrorModel2 = median(totalResidualError_RMS_Walker_markerPertubations1_005);
totalResidualErrorModel3 = median(totalResidualError_RMS_Walker_markerPertubations1_0125);

figure()
subplot(3,3,[1,4,7])
alpha = 0.05;
two_tailed = true;
iterations = 100;
snpm = spm1d.stats.nonparam.anova1rm(totalResidualError,index,SUBJ);
snpmi = snpm.inference(alpha,'iterations', iterations);
disp( snpmi )
snpmi.plot(); snpmi.plot_threshold_label(); snpmi.plot_p_values();
title('I) One-way rm ANOVA','FontWeight','bold');
xlabel("Frames",'FontWeight','bold');ylabel("SPM {t}",'FontWeight','bold')
subplot(3,3,2)
alpha = spm1d.util.p_critical_bonf(0.05, 4);
two_tailed = true;
iterations = 120;
snmpmi = spm1d.stats.nonparam.ttest_paired(totalResidualError_RMS_Walker',totalResidualError_RMS_Walker_markerPertubations1');
snpmi = snmpmi.inference(alpha, 'two_tailed', two_tailed, 'iterations', iterations);
snpmi.plot(); snpmi.plot_threshold_label(); snpmi.plot_p_values();
title({"II a) Total residual errors between baseline and model 1"},'FontWeight','bold');
xlabel("Frames",'FontWeight','bold');ylabel("SPM {t}",'FontWeight','bold')

Answer 17 · 2023-10-25T09:45:03.000Z

Probability densities from permutation test have finite resolution that depend directly on the number of permutations (i.e., iterations). Since the post hoc tests use iterations = 120 you are seeing x = 1/120 = 0.0083 as the p-values; this is the minimum possible p-value for iterations=120. As general guidelines:

iterations should be as large as possible, usually 10,000 or 100,000 (or -1 for all permutations if the total number of permutations is less than 100,000)
iterations should not change for post hoc analysis

Answer 18 · 2023-10-25T09:57:02.000Z

Thank you. Just to confirm:

the corrected alpha value uses just the number of tests and is not dependent on any other variable. So in the previous example, alpha_corrected = spm1d.util.p_critical_bonf(0.05, 4) ?
If parameteric and nonparameteric tests are qualitatively similar but normality is false, would we still have to report nonparametric results? I ask because with large number of interations, processing becomes slow

Thanks,
Vignesh

Answer 19 · 2023-10-25T16:48:03.000Z

Hi sorry,

an additional question to the above one: When I run using iterations = 5000, I get the following errors while plotting p:
Error using text
Non-numeric data is not supported in 'Text'.

Answer 20 · 2023-10-25T21:01:34.000Z

What command causes that error?

Answer 21 · 2023-10-25T21:03:10.000Z

The plot p value command. plot_p_values();

Answer 22 · 2023-10-25T21:07:49.000Z

the corrected alpha value uses just the number of tests and is not dependent on any other variable. So in the previous example, alpha_corrected = spm1d.util.p_critical_bonf(0.05, 4) ?

Yes, this is reasonable if the total possible number of post hoc tests is 4.

If parameteric and nonparameteric tests are qualitatively similar but normality is false, would we still have to report nonparametric results? I ask because with large number of interations, processing becomes slow

Please refer to: #155, #147, #166 and other threads in this forum.

Answer 23 · 2023-10-25T21:18:30.000Z

I'm going to close this thread now because several issues have been reported here. Please feel free to open a new issue. Preferably focus each issue on a single topic like "interpreting nonparametric results" or "appropriate post hoc procedures".

Answer 24 · 2023-10-25T21:20:01.000Z

The plot p value command. plot_p_values();

Perhaps because there are no clusters? If this does not solve the problem please open a new issue with a subject something like: "Error: Non-numeric data is not supported in 'Text'"