/mowgli_reproducibility

Data and scripts to reproduce experiments for Mowgli.

Primary LanguagePython

Mowgli: Multi Omics Wasserstein inteGrative anaLysIs

Mowgli is a novel method for the integration of paired multi-omics data with any type and number of omics, combining integrative Nonnegative Matrix Factorization and Optimal Transport. Read the paper!

This is the code used to perform the experiments and generate the figures in our manuscript. If you are looking for the Python package, click here!

figure

Code structure

  • enrich contains the code for the enrichment analysis
  • evaluate contains the code for computing the various evaluation metrics
  • integrate contains the code used to perform the integration with Mowgli, MOFA+, Seurat, Cobolt, Multigrate and integrative NMF
  • preprocess contains the preprocessing code
  • visualize contains the visualization code used to produce the figures of the paper

Data

For convenience, raw and processed data as well as the ground truth annotation are available at the following figshare link: https://figshare.com/s/1b13e12f33e83fff7e0e. Below you will find the original references.

Liu

  • Original publication: Liu, L. et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat. Commun. 10, 470 (2019).
  • Relevant data: Supplementary Data 3 and Supplementary Data 4 of the original publication are .tsv files containing the chromatin accessibility and gene expression data, respectively.
  • Ground truth: Columns in these files contain cell line annotation.

PBMC

BM CITE

OP CITE

  • Original publication: Luecken, M. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (eds. Vanschoren, J. & Yeung, S.) vol. 1 (2021).
  • GEO accession: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122 under name GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad.gz
  • Ground truth: Cell type annotation is available in the linked file

OP Multiome

  • Original publication: Luecken, M. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (eds. Vanschoren, J. & Yeung, S.) vol. 1 (2021).
  • GEO accession: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122 under name GSE194122_openproblems_neurips2021_multiome_BMMC_processed.h5ad.gz
  • Ground truth: Cell type annotation is available in the linked file

TEA-seq

Publication

https://www.nature.com/articles/s41467-023-43019-2

@article{huizing2023paired,
  title={Paired single-cell multi-omics data integration with Mowgli},
  author={Huizing, Geert-Jan and Deutschmann, Ina Maria and Peyr{\'e}, Gabriel and Cantini, Laura},
  journal={Nature Communications},
  volume={14},
  number={1},
  pages={7711},
  year={2023},
  publisher={Nature Publishing Group UK London}
}