/scMM

Primary LanguageJupyter NotebookMIT LicenseMIT

scMM: Mixture-of-experts multimodal deep generative model for single-cell multiomics analysis

figure

scMM is a novel deep generative model-based framework for the extraction of interpretable joint representations and cross-modal generation for single-cell multiomics data (e.g. transcriptome & chromatin accessibility, transcriptome & surface proteins). It is based on a mixture-of-experts multimodal deep generative model and achieves end-to-end learning by modeling raw count data in each modality based on different probability distributions.

colab_tutorial.ipynb shows how to run scMM using GPU on Google Colab. For the tutorial, we use toy data generated from CITE-seq (single-cell transctiptome & surface protein) data for bone marrow mononuclear cell (BMNC) including randomely subsampled 15,000 cells (Stuart and Butler et. al., 2018). Most varaible 5000 genes were selected for transcriptome data.

RNA and protein count matrix should be stored in folder named RNA-seq and CITE-seq accomapnied with feature information stored in gene.tsv and protein.tsv, respectively. Also, single-cell barcode stored in barcode.tsv should be included. When running on chromatin accessibility data, name folder as ATAC-seq and feature file as peak.tsv. For example, folder structure looks like:

data/BMNC
     |---RNA-seq
     |   |---RNA_count.mtx
     |   |---gene.tsv
     |   |---barcode.tsv
     |---CITE-seq
         |---Protein_count.mtx
         |---protein.tsv
         |---barcode.tsv

Tutorial on downstream analysisfor scMM outputs can be found at R/tutorial.R. Vignette is available here. Codes were adopted from the MMVAE repository.

Check out our preprint for more details on the methods.