Danko-Lab/BayesPrism

sc.dat is single cell or bulk seq matrix?

trusha0911 opened this issue · 3 comments

Hi!

Just wanted to quickly confirm if sc.dat is "The cell-by-gene raw count matrix of bulk RNA-seq expression. rownames are bulk cell IDs,
while colnames are gene names/IDs." as mentioned in the tutorial or is it Single Cell raw count matrix? And if it is single cell matrix then is it alright to use merged data (from several patients) which underwent QC or completely unfiltered?

Many thanks!

tinyi commented

Hi The file sc.dat represents the scRNA-seq count matrix. Please note that the term "bulk" was a typographical error. I've corrected this in the updated vignette. Thank you for bringing it to our attention. For optimal results, it's essential to filter and perform quality control (QC) on the input count matrix, in line with standard procedures for processing scRNA-seq data. Depending on the cell type's heterogeneity across patients, you have two options: 1. If the cell type is of low heterogeneity, you can label each cell type while omitting the patient ID, similar to the approach used for endothelial, pericytes and oligodendrocytes in the tutorial. 2. Alternatively, when the cell type exhibits high heterogeneity you can categorize the cell from each patient / subcluster as a cell state, similar to the approach used for malignant cells and myeloid cells in the tutorial. Best, Tinyi

Hi

I've a question, I'm a little bit confused.
In the tutorial, sc.dat has dimensions
23793 x 60294
and sc.bk has dimensions
169 x 60483.

Does this mean there are around 60K genes and are they unique?

Thank you,
Youcef.