Warning for running mmDS

Question

Warning for running mmDS

Opened this issue 4 years ago · 6 comments

mm <- mmDS(B[1:5, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)

Testing 5 genes across 37618 cells in cluster “Mye”...
[1] "~group_id+(1|sample_id)"
Argument 'coef' not specified; testing for “group_idHealthy”.
Dividing work into 5 chunks...
iteration: 5

Total:3 s
Warning messages:
1: In DGEList(counts(x), norm.factors = 1/sizeFactors(x)) :
library size of zero detected
2: In DGEList(counts(x), norm.factors = 1/sizeFactors(x)) :
norm factors don't multiply to 1
3: In dream(v, formula, cd, contrast, ddf, BPPARAM = bp, suppressWarnings = !verbose, :
Contrasts with only a single non-zero term are already evaluated by default.

However, I have calculate the sizeFactor using scran and there is no cell with zero sizeFactor

summary(sizeFactors(B))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1000 0.3795 0.6933 1.0402 1.3879 5.7137

Answer 1 · 2020-08-28T19:32:14.000Z

Also, when I run

mm <- mmDS(B[1:5, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)
mm1 <- mmDS(B[1:13, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)

For the first 5 genes, it gave me totally differnt results.

m
$Mye
gene cluster_id logFC AveExpr t p_val p_adj.loc
1 AL627309.1 Mye -0.002193570 18.88785 -0.1583302 0.8774116 0.8774116
2 AL669831.5 Mye 0.012443001 18.91737 0.6389820 0.5384964 0.8774116
3 FAM87B Mye -0.003013743 18.88637 -0.2087911 0.8388561 0.8774116
4 LINC00115 Mye -0.021901137 18.92037 -1.6531435 0.1368044 0.6840218
5 FAM41C Mye 0.002539270 18.90313 0.3267231 0.7523846 0.8774116
p_adj.glb
1 1.0000000
2 1.0000000
3 1.0000000
4 0.6840218
5 1.0000000

mm1
$Mye
gene cluster_id logFC AveExpr t p_val
1 AL627309.1 Mye 3.2182537 15.31577 5.7655580 1.848316e-04
2 AL669831.5 Mye 3.2314543 15.34529 5.7686655 1.838992e-04
3 FAM87B Mye 3.2174258 15.31429 5.7665500 1.845354e-04
4 LINC00115 Mye 3.1976050 15.34830 5.6860652 2.062461e-04
5 FAM41C Mye 3.2243391 15.33105 5.7865427 1.795426e-04
6 NOC2L Mye 3.1473971 15.54220 5.0692537 4.912790e-04
7 KLHL17 Mye 3.2097299 15.33036 5.7298683 1.940235e-04
8 PLEKHN1 Mye 3.1847757 15.33759 5.7781740 1.818259e-04
9 AL645608.8 Mye 3.1277136 15.35627 5.8886284 1.570773e-04
10 HES4 Mye 2.2910967 15.93694 7.2626197 3.228388e-05
11 ISG15 Mye -0.5153947 19.34414 -0.8883452 3.952200e-01
12 AL645608.2 Mye 3.1954702 15.32648 5.7799704 1.811552e-04
13 AGRN Mye 2.9120996 15.54731 4.8283202 7.014312e-04
p_adj.loc p_adj.glb
1 0.0002681199 0.0019749691
2 0.0002681199 0.0019749691
3 0.0002681199 0.0019749691
4 0.0002681199 0.0019749691
5 0.0002681199 0.0019749691
6 0.0005806024 0.0019749691
7 0.0002681199 0.0019749691
8 0.0002681199 0.0019749691
9 0.0002681199 0.0018849275
10 0.0002681199 0.0004196904
11 0.3952200062 0.3952200062
12 0.0002681199 0.0019749691
13 0.0007598838 0.0019749691

I will appreciate your help.

Answer 2 · 2020-08-31T09:11:22.000Z

Both issues should disappear in the latest github push (master branch).

Answer 3 · 2020-09-01T14:06:02.000Z

There are new error for the updated verison.

Testing 13 genes across 34759 cells in cluster “Mye”...
Error in DGEList(counts(x), lib.sizes = sizeFactors(x), norm.factors = rep(1, :
unused argument (lib.sizes = sizeFactors(x))

Answer 4 · 2020-09-01T14:07:38.000Z

mm <- mmDS(B[1:50, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)
Testing 13 genes across 34759 cells in cluster “Mye”...
Error in DGEList(counts(x), lib.sizes = sizeFactors(x), norm.factors = rep(1, :
unused argument (lib.sizes = sizeFactors(x))

Also, if I want to use imputed data such as SAVER output. How can I adjust the code to do so? I notice that the default input the the subject is 'count'.

Thanks so much, plger!

Answer 5 · 2020-09-01T14:28:07.000Z

That was a typo, should be fixed now.
At the moment there's not really an in-built possibility to use non-counts inputs to mmDS. One possibility would be to pretend that they are counts (i.e. put it in the counts assay), and use a method that should be robust to it, like dream/dream2. But you'd need to have it on the same scale for the filtering to work, and to set your sizeFactors to 1 (assuming it's already normalized).
We'll consider adding an option to run the normal LMM on custom data.
This being said, I'm not sure it's a good idea to use imputed data for differential expression (I vaguely remember papers showing errors stemming from it).

Answer 6 · 2020-09-02T13:39:08.000Z

Thanks, plger. Since ‘Dream’ only uses the lognormalized data as input, can you write an option to by-pass the filtering? I try to set the minimal count as 0, but it didn't work.

Here is a paper discussing the effect of impuation

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x