Specifying contrasts in MUSCAT mixed models

Question

Specifying contrasts in MUSCAT mixed models

Opened this issue a year ago · 6 comments

Dear MUSCAT team,
@HelenaLC @plger
I hope my message finds you all fine. I have a small question for you:
I would like to test the difference in gene expression between 3 groups: A, B and C (where group C is the control group, and thus the reference level of the group factor). I am interested in the following comparisons: A vs. C, B vs. C & A vs. C. I would like to like to test this using mmDS.
I ran the function as follows:

mm <- mmDS(sce, method = "dream2",
n_cells = 10, n_samples = 2,
min_count = 1, min_cells = 20,
coef = c("group_idA","group_idB"), covs = "scaled.cdr")

where scaled.cdr is a scaled vector of the cellular detection rate (number of genes expressed in each cell)

The result looked like this:
List of 1
$ CellType1:'data.frame': 6586 obs. of 9 variables:
..$ gene : chr [1:6586] "AL627309.1" "AP006222.2" "RP4-669L17.10" "RP11-206L10.3" ...
..$ cluster_id : chr [1:6586] "CellType1" "CellType1" "CellType1" "CellType1" ...
..$ group_idA : num [1:6586] -0.0471 -0.0226 -0.0489 -0.0492 -0.0564 ...
..$ group_idB: num [1:6586] -0.0327 -0.0215 -0.0365 -0.0838 -0.0355 ...
..$ AveExpr : num [1:6586] 18.1 18 18 18.2 18 ...
..$ F : num [1:6586] 1.31 0.31 1.34 1.95 1.93 ...
..$ p_val : num [1:6586] 0.28 0.735 0.273 0.154 0.157 ...
..$ p_adj.loc : num [1:6586] 0.856 0.891 0.856 0.821 0.825 ...
..$ p_adj.glb : num [1:6586] 0.856 0.891 0.856 0.821 0.825 ...

So I have the coefficients representing the comparisons A vs. C (group_idA) & B vs. C (group_idB) and a single P-value which I assume represents the likelihood ratio test result after dropping the whole "group" variable from the model (Please correct me if I am wrong ?)

My questions is how I can test the significance of each comparison (A vs. C, B vs. C & A vs. B) individually and get a separate P-value for each comparison using mixed-effects models?

I know that in pseudobulk analysis with pbDS, it is possible to achieve this by defining a design matrix and a contrast matrix as described in the vignette. But how can I achieve this in mixed models?

Thank you very much.

Sincerely,
Ismail

Answer 1 · 2023-04-17T15:02:26.000Z

I have the same question. I appreciate any input.
Paria

Answer 2 · 2023-04-17T15:17:32.000Z

The current way these methods are implemented in mmDS do not support the use of contrasts, so you have to rely on the coefficients directly.

Assuming that group_idC are your controls, then ensure that this is the base level of the factor, and use coef="group_idA" to get A vs C, and coef="group_idB" to get B vs C.
To get A vs B, you'd have to refactor so that B is the base level, and then use coef="group_idA" again.

Answer 3 · 2023-04-17T15:44:53.000Z

Thanks for your response @plger . I was wondering if this is correct when we have 6 different group id with below comparison. I also would like to regress batch and sex effect.

Moreover, when I compare two level of group_id should I subset those two level? if I shouldn't (I didn't do in my analysis) do other levels cause error in the analysis or the model count for this?
Thank you so much!
Paria

Answer 4 · 2023-04-18T09:05:10.000Z

@plger I see, thank you very much for your kind reply and clarification.

@pariaaliour your question is a good one. I would assume you should keep all levels of the "group_id" variable such that you retain all available information in your model and do not throw groups out. In this way you would get a better estimate of the variability of gene expression across the different groups, what do you think @plger ?

@plger One additional point related to @pariaaliour question : the P-value I get in the above example indicates the probability that the difference between any of the 3 groups A, B or C occurred by chance, is that correct? If this is indeed correct, then to be able to get P-values for one specific comparison e.g. A vs. C or A vs. B or B vs. C, one should indeed subset the data selecting only the two interesting groups and run the analysis on that particular subset. But again, this would mean we have to throw away the information from the third group which is not what I wanna do. Ideally, I would like to perform something similar to an ANOVA using all 3 groups followed by a posthoc test for all pairwise comparisons. Is this achievable using mmDS? Could you kindly share your thoughts about that?

Answer 5 · 2023-04-18T10:47:40.000Z

With coef=c("group_idA"), the p-value is that of a specific comparison (between A and the base factor level). Very leniently interpreted, it's the probability of having that difference between A and C by chance. With coef=c("group_idA","group_idB"), both coefficients are dropped, so the question boils down to whether a model without groups fits the data significantly less than a model with groups. Put differently, whether the gene changes across (any of the) groups.

There's no need to subset, and I'd also leave the other groups in (in a bulk setting the main reason would be for the overdispersion estimates to be more accurate; but using dream this would rather be for the distribution of the random effects), unless the differences between them are massive.

All this being said, we and others have shown that, unless you need to correct for cell-level covariates (doesn't seem to be the case here), there's no advantage in using mixed models over pseudobulk aggregation. (In fact, generally speaking the touted superiority of mixed models over aggregation is a statistical myth -- across most settings anyway).

Answer 6 · 2023-04-19T13:14:28.000Z

@plger Thank you very much for this very clear explanation. I really appreciate it 😃