full list of genes
Closed this issue · 1 comments
Hi,
Thank you so much for maintaining muscat
.
I've followed your tutorial, and I ran successfully (no errors) the DS analysis.
> res <- pbDS(pb, design = mm, contrast = contrast)
># results
>tbl <- res$table[[1]]
> # DE cutoffs
> tbl_fil <- lapply(tbl, function(u) {
> u <- dplyr::filter(u, p_adj.loc < 0.05, abs(logFC) > 1)
> dplyr::arrange(u, p_adj.loc)
> })
> n_de <- vapply(tbl_fil, nrow, numeric(1))
> p_de <- format(n_de / nrow(sce) * 100, digits = 3)
> data.frame("#DS" = n_de, "%DS" = p_de, check.names = FALSE)
#DS %DS
MS1 24 0.1098
MS2 17 0.0778
MS3 49 0.2242
MS4 24 0.1098
But, I realized I'm not getting results (logFC, padj, etc.) from all genes (21852).
# Pseudo bulk
> pb
class: SingleCellExperiment
dim: 21852 44
metadata(2): experiment_info agg_pars
assays(4): MS1 MS2 MS3 MS4
rownames(21852): RP11-34P13.7 RP11-34P13.8 ... AC233755.1 AC240274.1
rowData names(0):
colnames(44): P17H P18F ... E12 E16
colData names(1): group_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
> vapply(tbl_fil, nrow, numeric(1))
MS1 MS2 MS3 MS4
24 17 49 24
> vapply(tbl, nrow, numeric(1))
MS1 MS2 MS3 MS4
3701 3641 3950 2749
Not sure, if pbDS
removes low expressed genes before DE or if the results are NA
. Why I am getting so low genes? Is it normal?
Any inside will be very well appreciated.
Best,
Yes, indeed. Please have a look at the documentation for pbDS
's min_cells
and filter
and arguments:
min_cells
: a numeric. Specifies the minimum number of cells in a given cluster-sample required to consider the sample for differential testing.filter
: character string specifying whether to filter on genes, samples, both or neither.
This means there are various options for filtering.
filter = "none"
will disable all filtering"genes"
will filter genes viaedgeR::filterByExpr
"samples"
will filter, for each cluster, samples viascater::isOutlier
"both"
will filter both genes and samples according to the above- finally, for each cluster, samples with less than
min_cells
will also be removed
If you're interested in seeing how this is done exactly, you can have a look at the corresponding code here.
Here's a quick demo of how this can affect the number of clusters and genes tested:
> library(muscat)
> data(example_sce)
> pb <- aggregateData(example_sce)
>
> # 1267 genes, 5 clusters
> nrow(pb)
[1] 1267
> length(assays(pb))
[1] 5
>
> # default parameters
> res <- pbDS(pb,
+ min_cells = 10,
+ filter = "both",
+ verbose = FALSE)
>
> # 3 clusters, 917-1187 genes tested
> tbl <- res$table[[1]]
> sapply(tbl, nrow)
B cells CD8 T cells FCGR3A+ Monocytes
917 569 1187
>
> # disable all filtering
> res <- pbDS(pb,
+ min_cells = 0,
+ filter = "none",
+ verbose = FALSE)
>
> # 5 clusters, 1203-1266 genes tested
> # (the ones missing aren't detected
> # in at least 2 samples per group)
> tbl <- res$table[[1]]
> sapply(tbl, nrow)
B cells CD14+ Monocytes CD4 T cells
1229 1266 1228
CD8 T cells FCGR3A+ Monocytes
1203 1266