Question: Does well separated color in the dendrogram mean that the items are distinct?
DongzeHE opened this issue · 1 comments
Hello,
Thanks for providing this interesting cell clustering method. I am using this method to analyze the similarity between the spliced and unspliced count matrix from the same single-cell RNA-seq dataset. So these two count matrices are actually two types of signals of the same sample. The result returned from TooManyCells is interesting, so I would hope that you could help me understand what the result tells us. Thank you in advance!
I gave TooManyCell the spliced and unspliced raw count matrices, and mark the two using "S" for spliced and "U" for unspliced. The result shows that the items in the two matrices are well separated and has no overlap.
If possible, could you please help me understand the result? I have the following questions:
- Do I need to normalize or scale the two matrices before running TooManyCells?
- If giving the raw count matrices is the correct thing to do, does this result (S and U items are separated) mean that the two types of signals from the same sample are totally different?
- Why is the unspliced side of the tree larger than the spliced side? Does this mean that unspliced can separate the data better?
- If the result shows that the two types of signals are totally different, how could I show that both of them are biologically meaningful? For example, do you think that finding rare cell types from the items in each matrix separately will show that the two matrices are different and both biologically meaningful? Are there any other things I can do to show that they are both biologically meaningful?
Thanks so much! I am looking forward to your reply!
Best,
Dongze
Sorry, this went under my radar.
- You can normalize in TooManyCells using a variety of different methods, see
too-many-cells make-tree -h
. By default, TF-IDF is used. - Yes, in theory, but it depends on the normalization assumptions.
- What do you mean by larger? Size is indicated by node size, not the area of the tree, so they look pretty similar to me.
- You can try different normalizations if you expect there to be more mixing and TF-IDF (followed by cosine similarity) is insufficient.