HelenaLC/muscat

p_adj.loc greater than p_adj.glb

Closed this issue · 2 comments

Hello!

I have noticed that the p_adj.loc is sometimes greater than the p_adj.glb in my differential expression results (using pbDS on an snRNA-seq dataset, with the edgeR method),

To my understanding the p_adj.loc is produced by correcting for multiple testing across all genes tested within a cluster while the p_adj.glb is produced by correcting for multiple testing across all genes tested in all clusters. Thus, p_adj.loc should be less than p_adj.glb.

Here is a small section of the table produced by resDS as an example (rows 3, 7, 16, 18, and 19):

<style> </style>
  gene cluster_id logFC logCPM F p_val p_adj.loc p_adj.glb contrast
1 FO538757.2 Inhib_5 0.297907 6.55732 0.851136 0.361205 0.786318 0.804342 ei.group_idS-ei.group_idC
2 RP5-857K21.4 Inhib_5 -0.45668 6.879999 1.461924 0.233008 0.685881 0.715474 ei.group_idS-ei.group_idC
3 HES4 Inhib_5 -0.14578 5.831962 0.146945 0.703297 0.946466 0.94204 ei.group_idS-ei.group_idC
4 SCNN1D Inhib_5 0.393927 5.304127 0.538035 0.467095 0.849796 0.858966 ei.group_idS-ei.group_idC
5 ACAP3 Inhib_5 1.019394 6.236945 9.745041 0.003155 0.143755 0.196263 ei.group_idS-ei.group_idC
6 ATAD3B Inhib_5 0.690168 5.472104 2.170819 0.147689 0.589724 0.629304 ei.group_idS-ei.group_idC
7 MIB2 Inhib_5 -0.16557 5.665529 0.11623 0.734764 0.956805 0.950395 ei.group_idS-ei.group_idC
8 SLC35E2B Inhib_5 0.572459 5.423179 2.145233 0.150042 0.590007 0.632773 ei.group_idS-ei.group_idC
9 GNB1 Inhib_5 -0.21795 7.569159 1.530005 0.222599 0.678797 0.706937 ei.group_idS-ei.group_idC
10 GABRD Inhib_5 -0.51315 6.940436 5.345291 0.025464 0.321897 0.369583 ei.group_idS-ei.group_idC
11 PRKCZ Inhib_5 0.349367 6.707143 2.446721 0.124854 0.561587 0.600813 ei.group_idS-ei.group_idC
12 SKI Inhib_5 0.226541 6.517301 0.675613 0.415483 0.822396 0.833289 ei.group_idS-ei.group_idC
13 PLCH2 Inhib_5 0.47294 6.617706 4.095984 0.049014 0.408241 0.452765 ei.group_idS-ei.group_idC
14 CEP104 Inhib_5 0.295637 6.111008 0.92458 0.34147 0.770297 0.792691 ei.group_idS-ei.group_idC
15 DFFB Inhib_5 0.555827 5.683095 2.5341 0.118489 0.557515 0.590573 ei.group_idS-ei.group_idC
16 AJAP1 Inhib_5 0.062656 7.534503 0.086387 0.770188 0.965444 0.959339 ei.group_idS-ei.group_idC
17 NPHP4 Inhib_5 0.614426 6.15494 4.304073 0.043834 0.394343 0.437253 ei.group_idS-ei.group_idC
18 KCNAB2 Inhib_5 0.037785 7.168045 0.032431 0.857903 0.986452 0.97939 ei.group_idS-ei.group_idC
19 CHD5 Inhib_5 0.050941 7.044387 0.046852 0.829623 0.979776 0.972831 ei.group_idS-ei.group_idC

I am using version 1.8.0 of muscat.

I would appreciate any information you can provide to help understand this discrepancy.

Thank you!

May be related to #69

plger commented

Hi,
while you are correct about the meaning of the global/local distinction, and that this will tend to lead to lower local adjusted p-values, in fact this is not always the case. Consider the following example:

> set.seed(123)
> pvals <- c(0.01,10^-abs(rnorm(9))) # random p-values, with 0.01 at the beginning
> p.adjust(pvals,method="BH")
 [1] 0.09208109 0.45853565 0.73575382 0.09208109 0.85014227 0.82503003
 [7] 0.09208109 0.49429447 0.13579343 0.41131746
> p.adjust(c(0.01,pvals),method="BH")
 [1] 0.05500000 0.05500000 0.43233361 0.71940373 0.07596690 0.85014227
 [7] 0.81677973 0.07066534 0.47575843 0.11949822 0.37704100

Notice that in the second case, I add another 0.01 at the beginning, and that this leads to a decrease in the adjusted p-value of the existing 0.01 (from ~0.09 to 0.055; others are also changed a bit).
The reason for this is that the correction method works under the assumption that a certain proportion of low p-values are expected under the null hypothesis, and looks for true signal in an excess of such p-values. Hence the presence of other such similar p-values decreases the probability that any of them is due to chance.

Thank you @plger! This makes sense.