HelenaLC/muscat

Warning for running mmDS

Opened this issue · 6 comments

mm <- mmDS(B[1:5, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)

Testing 5 genes across 37618 cells in cluster “Mye”...
[1] "~group_id+(1|sample_id)"
Argument 'coef' not specified; testing for “group_idHealthy”.
Dividing work into 5 chunks...
iteration: 5

Total:3 s
Warning messages:
1: In DGEList(counts(x), norm.factors = 1/sizeFactors(x)) :
library size of zero detected
2: In DGEList(counts(x), norm.factors = 1/sizeFactors(x)) :
norm factors don't multiply to 1
3: In dream(v, formula, cd, contrast, ddf, BPPARAM = bp, suppressWarnings = !verbose, :
Contrasts with only a single non-zero term are already evaluated by default.

However, I have calculate the sizeFactor using scran and there is no cell with zero sizeFactor

summary(sizeFactors(B))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1000 0.3795 0.6933 1.0402 1.3879 5.7137

Also, when I run

  1. mm <- mmDS(B[1:5, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)
  2. mm1 <- mmDS(B[1:13, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)

For the first 5 genes, it gave me totally differnt results.

m
$Mye
gene cluster_id logFC AveExpr t p_val p_adj.loc
1 AL627309.1 Mye -0.002193570 18.88785 -0.1583302 0.8774116 0.8774116
2 AL669831.5 Mye 0.012443001 18.91737 0.6389820 0.5384964 0.8774116
3 FAM87B Mye -0.003013743 18.88637 -0.2087911 0.8388561 0.8774116
4 LINC00115 Mye -0.021901137 18.92037 -1.6531435 0.1368044 0.6840218
5 FAM41C Mye 0.002539270 18.90313 0.3267231 0.7523846 0.8774116
p_adj.glb
1 1.0000000
2 1.0000000
3 1.0000000
4 0.6840218
5 1.0000000

mm1
$Mye
gene cluster_id logFC AveExpr t p_val
1 AL627309.1 Mye 3.2182537 15.31577 5.7655580 1.848316e-04
2 AL669831.5 Mye 3.2314543 15.34529 5.7686655 1.838992e-04
3 FAM87B Mye 3.2174258 15.31429 5.7665500 1.845354e-04
4 LINC00115 Mye 3.1976050 15.34830 5.6860652 2.062461e-04
5 FAM41C Mye 3.2243391 15.33105 5.7865427 1.795426e-04
6 NOC2L Mye 3.1473971 15.54220 5.0692537 4.912790e-04
7 KLHL17 Mye 3.2097299 15.33036 5.7298683 1.940235e-04
8 PLEKHN1 Mye 3.1847757 15.33759 5.7781740 1.818259e-04
9 AL645608.8 Mye 3.1277136 15.35627 5.8886284 1.570773e-04
10 HES4 Mye 2.2910967 15.93694 7.2626197 3.228388e-05
11 ISG15 Mye -0.5153947 19.34414 -0.8883452 3.952200e-01
12 AL645608.2 Mye 3.1954702 15.32648 5.7799704 1.811552e-04
13 AGRN Mye 2.9120996 15.54731 4.8283202 7.014312e-04
p_adj.loc p_adj.glb
1 0.0002681199 0.0019749691
2 0.0002681199 0.0019749691
3 0.0002681199 0.0019749691
4 0.0002681199 0.0019749691
5 0.0002681199 0.0019749691
6 0.0005806024 0.0019749691
7 0.0002681199 0.0019749691
8 0.0002681199 0.0019749691
9 0.0002681199 0.0018849275
10 0.0002681199 0.0004196904
11 0.3952200062 0.3952200062
12 0.0002681199 0.0019749691
13 0.0007598838 0.0019749691

I will appreciate your help.

plger commented

Both issues should disappear in the latest github push (master branch).

There are new error for the updated verison.

Testing 13 genes across 34759 cells in cluster “Mye”...
Error in DGEList(counts(x), lib.sizes = sizeFactors(x), norm.factors = rep(1, :
unused argument (lib.sizes = sizeFactors(x))

mm <- mmDS(B[1:50, ], method = "dream", n_cells = 10, n_samples = 2, min_count = 1, min_cells = 20)
Testing 13 genes across 34759 cells in cluster “Mye”...
Error in DGEList(counts(x), lib.sizes = sizeFactors(x), norm.factors = rep(1, :
unused argument (lib.sizes = sizeFactors(x))

Also, if I want to use imputed data such as SAVER output. How can I adjust the code to do so? I notice that the default input the the subject is 'count'.

Thanks so much, plger!

plger commented

That was a typo, should be fixed now.
At the moment there's not really an in-built possibility to use non-counts inputs to mmDS. One possibility would be to pretend that they are counts (i.e. put it in the counts assay), and use a method that should be robust to it, like dream/dream2. But you'd need to have it on the same scale for the filtering to work, and to set your sizeFactors to 1 (assuming it's already normalized).
We'll consider adding an option to run the normal LMM on custom data.
This being said, I'm not sure it's a good idea to use imputed data for differential expression (I vaguely remember papers showing errors stemming from it).

Thanks, plger. Since ‘Dream’ only uses the lognormalized data as input, can you write an option to by-pass the filtering? I try to set the minimal count as 0, but it didn't work.

Here is a paper discussing the effect of impuation

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x