This repository contains notebooks and scripts used in the UniCell Deconvolve paper, for the purposes of demonstrating how the figures, tables, and analyses presented in the paper were generated. Fully reproducing this analysis requires additional steps including downloading significant amounts of external data, in addition to changing filepaths. All data required to reproduce the analysis is available upon reasonable request. Some of these files may be updated in the near future.
For tutorials on using UniCellDeconvolve, please see the full documentation available at https://ucdeconvolve.readthedocs.io/en/latest/ and download the software package at https://github.com/dchary/ucdeconvolve/tree/main/ucdeconvolve
The following table provides links to all datasets used in benchmarking of UCD.
Dataset Description | Figure | Use | In UCD | Source | Link |
---|---|---|---|---|---|
10K PBMC Healthy Donor | 2 | Mixture | No | 10X Genomics | source |
5K PBMC Healthy Donor | 2 | Reference | No | 10X Genomics | source |
Wang et. al 2020 Lung | 2 | Mixture | No | cellxgene | source |
Travaglini et. al 2020 Lung | 2 | Reference | Yes | cellxgene | source |
Cowan et al. 2020 Retina Periphery | 2 | Mixture | No | cellxgene | source |
Cowan et al. 2020 Retina Fovea | 2 | Reference | No | cellxgene | source |
DREAM Bulk Deconvolution Challenge | S2 | Reference & Mixture | No | GSE199324 | source |
Murine Kidney Injury | 3 | No | rebuildingakidney (RBK) | source | |
Human Breast Cancer | 4 | No | 10X Genomics | source | |
Human Prostate Cancer | 4 | No | 10X Genomics | source | |
Human Colon Cancer | 4 | No | 10X Genomics | source | |
Idiopathic Pulmonary Fibrosis | 5 | No | GSE134692 | source | |
Type II Diabetes | 5 | No | GSE50244 | source | |
Multiple Sclerosis | 5 | No | GSE138614 | source | |
STARMap | S2 | No | Qu Kun Lab | source | |
Ding et. al. 2019 PBMC Technical Comparison | S2 | No | GSE132044 | source |
Interpretable & Context-Free Deconvolution of Multi-Scale Whole Transcriptomic Data With UniCell Deconvolve
Authors: Daniel M. Charytonowicz, Rachel Brody, and Robert S. Sebra
We introduce UniCell: Deconvolve Base (UCDBase), a pre-trained, interpretable, deep learning model to deconvolve cell type fractions and predict cell identity across Spatial, bulk-RNA-Seq, and scRNA-Seq datasets without contextualized reference data. UCD is trained on 10 million pseudo-mixtures from the world's largest fully-integrated scRNA-Seq training database comprising over 28 million annotated single cells spanning 840 unique cell types from 898 studies. We show that our UCDBase and transfer learning models (UCDSelect) achievs comparable & superior performance on in-silico mixture deconvolution to existing, reference-based, state-of-the-art methods. Feature feature attribute analysis uncovers gene signatures associated with cell-type specific inflammatory-fibrotic responses in ischemic kidney injury, discerns cancer subtypes, and accurately deconvolves tumor microenvironments. UCD identifies pathologic changes in cell fractions among bulk-RNA-Seq data for several disease states. Applied to novel lung cancer scRNA-Seq data, UCD annotates and distinguishes normal from cancerous cells. Overall, UCD enhances transcriptomic data analysis, aiding in assessment of both cellular and spatial context.