/ucdeconvolve_paper

UniCell Deconvolve Paper

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

UniCell Deconvolve Paper


This repository contains notebooks and scripts used in the UniCell Deconvolve paper, for the purposes of demonstrating how the figures, tables, and analyses presented in the paper were generated. Fully reproducing this analysis requires additional steps including downloading significant amounts of external data, in addition to changing filepaths. All data required to reproduce the analysis is available upon reasonable request. Some of these files may be updated in the near future.

For tutorials on using UniCellDeconvolve, please see the full documentation available at https://ucdeconvolve.readthedocs.io/en/latest/ and download the software package at https://github.com/dchary/ucdeconvolve/tree/main/ucdeconvolve

Benchmarking Studies


The following table provides links to all datasets used in benchmarking of UCD.

Dataset Description Figure Use In UCD Source Link
10K PBMC Healthy Donor 2 Mixture No 10X Genomics source
5K PBMC Healthy Donor 2 Reference No 10X Genomics source
Wang et. al 2020 Lung 2 Mixture No cellxgene source
Travaglini et. al 2020 Lung 2 Reference Yes cellxgene source
Cowan et al. 2020 Retina Periphery 2 Mixture No cellxgene source
Cowan et al. 2020 Retina Fovea 2 Reference No cellxgene source
DREAM Bulk Deconvolution Challenge S2 Reference & Mixture No GSE199324 source
Murine Kidney Injury 3 No rebuildingakidney (RBK) source
Human Breast Cancer 4 No 10X Genomics source
Human Prostate Cancer 4 No 10X Genomics source
Human Colon Cancer 4 No 10X Genomics source
Idiopathic Pulmonary Fibrosis 5 No GSE134692 source
Type II Diabetes 5 No GSE50244 source
Multiple Sclerosis 5 No GSE138614 source
STARMap S2 No Qu Kun Lab source
Ding et. al. 2019 PBMC Technical Comparison S2 No GSE132044 source

Interpretable & Context-Free Deconvolution of Multi-Scale Whole Transcriptomic Data With UniCell Deconvolve

Authors: Daniel M. Charytonowicz, Rachel Brody, and Robert S. Sebra

We introduce UniCell: Deconvolve Base (UCDBase), a pre-trained, interpretable, deep learning model to deconvolve cell type fractions and predict cell identity across Spatial, bulk-RNA-Seq, and scRNA-Seq datasets without contextualized reference data. UCD is trained on 10 million pseudo-mixtures from the world's largest fully-integrated scRNA-Seq training database comprising over 28 million annotated single cells spanning 840 unique cell types from 898 studies. We show that our UCDBase and transfer learning models (UCDSelect) achievs comparable & superior performance on in-silico mixture deconvolution to existing, reference-based, state-of-the-art methods. Feature feature attribute analysis uncovers gene signatures associated with cell-type specific inflammatory-fibrotic responses in ischemic kidney injury, discerns cancer subtypes, and accurately deconvolves tumor microenvironments. UCD identifies pathologic changes in cell fractions among bulk-RNA-Seq data for several disease states. Applied to novel lung cancer scRNA-Seq data, UCD annotates and distinguishes normal from cancerous cells. Overall, UCD enhances transcriptomic data analysis, aiding in assessment of both cellular and spatial context.