HelenaLC/muscat

full list of genes

Closed this issue · 1 comments

Hi,

Thank you so much for maintaining muscat.
I've followed your tutorial, and I ran successfully (no errors) the DS analysis.


> res <- pbDS(pb, design = mm, contrast = contrast)

># results
>tbl <- res$table[[1]]

> # DE cutoffs
> tbl_fil <- lapply(tbl, function(u) {
>   u <- dplyr::filter(u, p_adj.loc < 0.05, abs(logFC) > 1)
>   dplyr::arrange(u, p_adj.loc)
> })

> n_de <- vapply(tbl_fil, nrow, numeric(1))
> p_de <- format(n_de / nrow(sce) * 100, digits = 3)
> data.frame("#DS" = n_de, "%DS" = p_de, check.names = FALSE)

    #DS    %DS
MS1  24 0.1098
MS2  17 0.0778
MS3  49 0.2242
MS4  24 0.1098

But, I realized I'm not getting results (logFC, padj, etc.) from all genes (21852).

# Pseudo bulk
> pb
class: SingleCellExperiment 
dim: 21852 44 
metadata(2): experiment_info agg_pars
assays(4): MS1 MS2 MS3 MS4
rownames(21852): RP11-34P13.7 RP11-34P13.8 ... AC233755.1 AC240274.1
rowData names(0):
colnames(44): P17H P18F ... E12 E16
colData names(1): group_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
> vapply(tbl_fil, nrow, numeric(1))
MS1 MS2 MS3 MS4 
 24  17  49  24 
> vapply(tbl, nrow, numeric(1))
 MS1  MS2  MS3  MS4 
3701 3641 3950 2749 

Not sure, if pbDS removes low expressed genes before DE or if the results are NA. Why I am getting so low genes? Is it normal?

Any inside will be very well appreciated.
Best,

Yes, indeed. Please have a look at the documentation for pbDS's min_cells and filter and arguments:

  • min_cells: a numeric. Specifies the minimum number of cells in a given cluster-sample required to consider the sample for differential testing.
  • filter: character string specifying whether to filter on genes, samples, both or neither.

This means there are various options for filtering.

  • filter = "none" will disable all filtering
  • "genes" will filter genes via edgeR::filterByExpr
  • "samples" will filter, for each cluster, samples via scater::isOutlier
  • "both" will filter both genes and samples according to the above
  • finally, for each cluster, samples with less than min_cells will also be removed

If you're interested in seeing how this is done exactly, you can have a look at the corresponding code here.

Here's a quick demo of how this can affect the number of clusters and genes tested:

> library(muscat)
> data(example_sce)
> pb <- aggregateData(example_sce)
> 
> # 1267 genes, 5 clusters
> nrow(pb) 
[1] 1267
> length(assays(pb))
[1] 5
> 
> # default parameters
> res <- pbDS(pb,
+     min_cells = 10,
+     filter = "both",
+     verbose = FALSE)
> 
> # 3 clusters, 917-1187 genes tested
> tbl <- res$table[[1]]
> sapply(tbl, nrow)
          B cells       CD8 T cells FCGR3A+ Monocytes 
              917               569              1187 
> 
> # disable all filtering
> res <- pbDS(pb, 
+     min_cells = 0, 
+     filter = "none",
+     verbose = FALSE)
> 
> # 5 clusters, 1203-1266 genes tested
> # (the ones missing aren't detected 
> # in at least 2 samples per group)
> tbl <- res$table[[1]]
> sapply(tbl, nrow)
          B cells   CD14+ Monocytes       CD4 T cells 
             1229              1266              1228 
      CD8 T cells FCGR3A+ Monocytes 
             1203              1266