0todd0000/spm1d

Unbalanced designs

Closed this issue · 13 comments

Hi Todd,

I have a question on unbalanced designs. I think part of the answer has been posted on a previous issue (#43). I plan to use 2 different stats. A 2 way ANOVA with one within and one between subject factor. An a random effects model.

Can I impute a data set where there is unbalance design? For example in a data set of ankle plantar flexion moment, I have 30 subjects, and each subject has a varying number of good trials (3-10). Does the two aforementioned tests receive unbalanced datasets? I couldn't find the answers on the spm1d website. Many thanks.

Regards,
Bernard

Hi Bernard,

Thank you for raising this issue. spm1d's procedures have only been validated for balanced designs, so please either use only means or use the same number of trials for each subject. If the design is unbalanced the F statistics may be incorrect. For balanced designs the same F statistic will be produced regardless of whether you use means or all trials.

I thought there was something written in the ANOVA documentation at spm1d.org about balanced vs. unbalanced designs, but it doesn't seem to be there. Also I just realized that design balance checks do not seem to be working in spm1d 0.4; the software should raise an error or at least a warning if the design is not balanced. I'll need to find out what the problem is, so give me a day or two to post documents to spm1d.org and also to fix the warning / error messages.

Thanks again for raising this issue!

Todd

Hi again Bernard,

Apologies for the delay. I've checked the code and actually the design balance checks seem to be working fine in both the MATLAB and Python versions of the code. I'm not sure why I thought they weren't working before, but it seems that they're OK. In MATLAB submitting unbalanced data should raise an error that looks something like this:

Error using spm1d.stats.anova.designs.ANOVA2/check_balanced (line 43)
Design must be balanced.

Note that unbalanced data is fine for one-way ANOVA (spm1d.stats.anova1) but that all other ANOVA procedures (including one-way repeated-measures) require balanced data.

Have you tried to submit unbalanced data to spm1d.stats.anova2onerm or another procedure? Please let me know whether or not it produces an error.

Todd

Hi Todd,

Thanks for following this up. I have not used it as yet, but I will be doing it within the next 3 weeks. I will let you know then? Many thanks.

Regards,
Bernard

OK, I'll close the issue for now to indicate that this is not necessarily a software bug. Please feel free to re-open this issue when you get to your analyses.
Todd

Hi Todd,

How have you been? I am running two tests 1) "Two-way ANOVA with repeated-measures on one factor" and 2) "Three-way ANOVA with repeated-measures on two factors". This is an either or question. I would love to do test (2) if possible.

However, test (2) threw up a "ValueError: Design must be balanced.", but not test (1)

SUBJ: 30 subjects
A (Between groups): 16 group 1 vs 14 group 2
B (within group time): 15 pre vs 15 post
optional C (within group side): 15 right vs 15 left.

Why is my design unbalanced for test two and not test one?

Regards,
Bernard

Hi Bernard,

For Factor A it sounds like there is a total of 30 subjects, with 16 in one group and 14 in another?

If that is correct, then to add a repeated-measures Factor B (with two levels: "pre" and "post") there should be 60 total observations (30 for pre and 30 for post).

Then to add a second repeated-measures Factor C (with levels "right" and "left"), there should be 120 total observations (30 pre-right, 30 pre-left, 30 post-right, 30 post-left).

I think that is how the data should be organized but I'm not totally sure... please let me know if my interpretation doesn't match your experiment.

Todd

Hi Todd,

Many thanks for the quick reply. Sorry for sloppiness, forgotten to multiply by two. You are totally right. 120 observations (30 pre-right, 30 pre-left, 30 post-right, 30 post-left).

Regards,
Bernard

Hi Bernard,

Thanks for confirming. Following that experimental design I can indeed reproduce the same ValueError. As I recall the reason for this ValueError is simply that spm1d's numerical results have not yet been verified for some designs using independently published results.

For example, in the examples folder you'll find the following file:
./spm1d/examples/stats0d/ex_anova2onermub.py
There are a variety of independent datasets available on the internet for this (unbalanced) design so I was able to check spm1d's results against those, and since they appear to be correct I made unbalanced cases accessible without warnings / errors for this design.

For other designs, including anova3tworm, I've not yet found suitable third-party datasets, so I'm not certain that spm1d's results are accurate. They very well might be accurate, but I thought it would be best to restrict access to arbitrary unbalanced designs until spm1d's results can be verified.

Please let me know if you are aware any published datasets or public examples on the internet that we could use to check the anova3tworm results. Alternatively, we could check the results using random datasets and third-party software like R, so if you need to use anova3tworm with unbalanced data please let me know and I'll try to verify its results as as soon as possible.

Todd

Thanks Todd,

Would it help if I provided with one data set, which I just collected, which exactly has that design?

Regards,
Bernard

Hi Bernard,
Yes please do send the dataset if possible.
Todd

Many thanks Todd,

I have emailed directly with the datasets.

Regards,
Bernard

Hi Bernard,

Apologies for the delay. I've looked at the problem a bit more closely, but I am still unable to find a third-party dataset for unbalanced three-way repeated-measures ANOVA, so I'm unable to validate the results I'm getting with spm1d and R. The spm1d results appear to be matching the R results, but without a third-party dataset (and expected results) I'm not 100% confident that my R analyses are correct. To ensure that spm1d returns valid results I prefer to leave it as is, raising an error for unbalanced three-way RM designs. Please let me know if you are aware of any third-party datasets we could use for verification. There may be some buried in software packages like SPSS, S, Minitab, etc.

If we can't find a verification dataset, here are two other options for proceeding:

  • Option 1: Impose balance on your design by removing two of the 16 subjects from Group 1 so that there are 14 subjects in both groups. Then repeat for all 16-choose-2 = 120 combinations of two subjects and check the results. If there are no qualitative changes in the results it would suggest that your dataset is not sensitive to subject substitutions, and I think this type of sensitivity analysis would be completely acceptable to reviewers.
  • Option 2: If the Left and Right limbs are not central to the design, then run two-way RM ANOVA separately on the left and right limbs. For example, if Left-Right represent injured and non-injured limbs, respectively, and if you are interested in the effects of injury, then Left and Right data probably shouldn't be separated. However, if you just measured both Left and Right, and don't have a specific hypothesis regarding Left-Right differences, then it is probably OK to separate the Left and Right data, analyze them separately and then qualitatively compare the results. I think reviewers would find this type of validation completely acceptable provided the Left and Right results are qualitatively similar.

Todd

Dear Todd,

Many thanks for the kind advice. I will brain storm the options you have suggested. In the mean time I will notify you if I do come across any third party data set with unbalanced 3 way designs.

Regards,
Bernard