cozygene/bisque

CountsToCPM after extracting markers

randel opened this issue · 2 comments

Thanks for the new approach! In ReferenceBasedDecomposition, if markers are provided, only sc and bulk data of those markers are taken to CountsToCPM. Does it make sense to calculate CPM for (say hundreds of) marker genes only? Or it makes more sense to calculate CPM for all genes and then take the subset of markers?

bisque/R/reference_based.R

Lines 298 to 315 in ef5bae0

if (base::is.null(markers)) {
markers <- Biobase::featureNames(sc.eset)
}
else {
markers <- base::unique(base::unlist(markers))
}
genes <- GetOverlappingGenes(sc.eset, bulk.eset, markers, verbose)
sc.eset <-
Biobase::ExpressionSet(assayData=Biobase::exprs(sc.eset)[genes,],
phenoData=sc.eset@phenoData)
bulk.eset <-
Biobase::ExpressionSet(assayData=Biobase::exprs(bulk.eset)[genes,],
phenoData=bulk.eset@phenoData)
if (verbose) {
base::message("Converting single-cell counts to CPM and ",
"filtering zero variance genes.")
}
sc.eset <- CountsToCPM(sc.eset)

Hi @randel, thanks for your interest in our method and the great question!

I can see how this is an issue for cases like the extreme example of only having one marker gene (every sample would have values of 0 or 1,000,000, including the reference). I am assuming that this issue becomes less significant as more marker genes as used; however, I will look into switching the order of operations here to avoid this issue from popping up. Thanks for pointing this out! I'll close this issue after I've finished testing.

added an option old.cpm to change the ordering. kept as an option for replicating older results. thanks!