UniCell Deconvolve Paper

This repository contains notebooks and scripts used in the UniCell Deconvolve paper, for the purposes of demonstrating how the figures, tables, and analyses presented in the paper were generated. Fully reproducing this analysis requires additional steps including downloading significant amounts of external data, in addition to changing filepaths. All data required to reproduce the analysis is available upon reasonable request. Some of these files may be updated in the near future.

For tutorials on using UniCellDeconvolve, please see the full documentation available at https://ucdeconvolve.readthedocs.io/en/latest/ and download the software package at https://github.com/dchary/ucdeconvolve/tree/main/ucdeconvolve

Benchmarking Studies

The following table provides links to all datasets used in benchmarking of UCD.

Dataset Description	Figure	Use	In UCD	Source	Link
10K PBMC Healthy Donor	2	Mixture	No	10X Genomics	source
5K PBMC Healthy Donor	2	Reference	No	10X Genomics	source
Wang et. al 2020 Lung	2	Mixture	No	cellxgene	source
Travaglini et. al 2020 Lung	2	Reference	Yes	cellxgene	source
Cowan et al. 2020 Retina Periphery	2	Mixture	No	cellxgene	source
Cowan et al. 2020 Retina Fovea	2	Reference	No	cellxgene	source
DREAM Bulk Deconvolution Challenge	S2	Reference & Mixture	No	GSE199324	source
Murine Kidney Injury	3		No	rebuildingakidney (RBK)	source
Human Breast Cancer	4		No	10X Genomics	source
Human Prostate Cancer	4		No	10X Genomics	source
Human Colon Cancer	4		No	10X Genomics	source
Idiopathic Pulmonary Fibrosis	5		No	GSE134692	source
Type II Diabetes	5		No	GSE50244	source
Multiple Sclerosis	5		No	GSE138614	source
STARMap	S2		No	Qu Kun Lab	source
Ding et. al. 2019 PBMC Technical Comparison	S2		No	GSE132044	source

Interpretable & Context-Free Deconvolution of Multi-Scale Whole Transcriptomic Data With UniCell Deconvolve

Authors: Daniel M. Charytonowicz, Rachel Brody, and Robert S. Sebra

We introduce UniCell: Deconvolve Base (UCDBase), a pre-trained, interpretable, deep learning model to deconvolve cell type fractions and predict cell identity across Spatial, bulk-RNA-Seq, and scRNA-Seq datasets without contextualized reference data. UCD is trained on 10 million pseudo-mixtures from the world's largest fully-integrated scRNA-Seq training database comprising over 28 million annotated single cells spanning 840 unique cell types from 898 studies. We show that our UCDBase and transfer learning models (UCDSelect) achievs comparable & superior performance on in-silico mixture deconvolution to existing, reference-based, state-of-the-art methods. Feature feature attribute analysis uncovers gene signatures associated with cell-type specific inflammatory-fibrotic responses in ischemic kidney injury, discerns cancer subtypes, and accurately deconvolves tumor microenvironments. UCD identifies pathologic changes in cell fractions among bulk-RNA-Seq data for several disease states. Applied to novel lung cancer scRNA-Seq data, UCD annotates and distinguishes normal from cancerous cells. Overall, UCD enhances transcriptomic data analysis, aiding in assessment of both cellular and spatial context.

wenmm/ucdeconvolve_paper

UniCell Deconvolve Paper

Benchmarking Studies

Interpretable & Context-Free Deconvolution of Multi-Scale Whole Transcriptomic Data With UniCell Deconvolve