ggloor/ALDEx_bioc

interpretation of t test conflicting results

Closed this issue · 2 comments

Hello,

I ran the following settings after subsetting my original dataset to only include the two groups for the paired analysis:

conds <- c(rep("200_ctrl", 3), rep("7.5_EV", 3))
x.all <- aldex(p2EC_7.5, conds, mc.samples=128, test="t", effect=TRUE,
include.sample.summary=FALSE, denom="all", verbose=FALSE)

and have attached the p2EC_7.5 matrix I used as input

I am a bit confused about which setting to use for denom so not sure if this is part of the issue, but I am finding that when I run these settings my welch's test shows that basically everything is significantly different, but my wilcoxon test shows that nothing is significantly different. So I have no idea how to interpret and I guess I made a mistake somewhere? Would really appreciate some advice on this

Thank you
Emma

p2EC_7.5.tsv.gz

Hi Emma

Thank you for providing the test dataset. It is most helpful for troubleshooting. No mistake, just a weird dataset! :-)

There are at least two independent issues here, and some interpretation needed

First, and this is the easy one, with 3 samples in each group a Wilcoxon test does not have enough power to detect BH corrected significance. Thus, the we.eBH is the preferred test.

Second, when examining your dataset, there is a marked asymmetry in the bulk of the features (that is, a histogram or density plot of x.all$diff.btw shows that the majority of features are not centred on no change). In your dataset, the data can be centred either by the denom='lvha', or denom='iqlr' flag, both work more or less the same, but I would prefer the lvha as it makes fewer assumptions about the data. This now centres the data

Third, after lvha centring, the majority of features are still significantly different because there is essentially no variance between groups in the MW plot (most features have dispersion below 1). Examination of the MA plot shows that your low variance features are also the most abundant. This has the effect of giving very low p values because there is almost no measurement error and very low experimental difference between groups.

So, what i would do in this case would be to use lvha to centre the data, then apply a difference cutoff of +/- 2 (or even +/_ 5) and we.eBH significance to filter out low variance features. You have two groups of features that have very large difference between groups (abs(x.all$diff.btw) > 5), and these are the features that are most different between your groups and likely what you are looking for.

Hope this helps

Greg

Hi Greg

Thanks for getting back to me so quickly with such a detailed explanation, makes much more sense. Could you recommend a resource where I can learn more about how to decide which statistical tests and corrections are appropriate for genomic data?

I tried with lvha and got a warning:

In cmultRepl(t(reads), label = 0, method = "CZM", suppress.print = TRUE) :
Column(s) containing more than 80% zeros/unobserved values were found (check it out using zPatterns).
(You can use the z.warning argument to modify the warning threshold).

I saw this when reading in the vignette about the different methods to correct for asymmetry
IMPORTANT: no rows should contain all 0 values as they will be removed by the aldex.clr function

It says they will be removed, but should I actually be removing these myself from the matrix prior to running the aldex() function?

Thank you!
Emma