dpeerlab/SEACells

Which modality to use in Multiome for metacell inference

Closed this issue · 4 comments

Hi,

In Multiome data which data modality would you recommend to use for the inference of metacells? RNA or ATAC?
I'm asking because in this tutorial you use ATAC but according to your biorxiv you mention that is actually harder:

As a further challenge, we ran Palantir on aggregated RNA from metacells computed on the ATAC modality, since the sparsity of scATAC-seq data renders cell-state identification much more difficult

I know it depends on the biological context but, would you recommend as a rule of thumb to use RNA instead for metacell inference? What do you think?

Thank you for your time!

Thank you for your query. For peak-gene associations, gene score computation etc, we do recommend the use of ATAC modality for metacell identification.

Thanks for the response!
Alright so for state definition better use RNA, but for epigenome analysis better use ATAC, makes sense.
In any case, are you thinking on making a joint metacell inference step? Something like an averaged aggregation coming from RNA and ATAC at the same time?
Thank you for your time

Joint metacell inference is an interesting idea and we are exploring a few options. We think a kernel representation that captures both modalities might be the best option here. For eg: MOFA+ generates a low embedding using multiple modalities. One can construct the nearest neighbor graph using the jointly learned embedding and generate a kernel which can then serve as input to SEACells.

Thanks for the feedback @ManuSetty! This makes a lot of sense, I'll play around this idea.