deconvolution-papers

A number of algorithms have recently been developed for deconvolution of cell types in bulk RNA-seq samples based on cell type signatures from matched single-cell RNA-seq samples or other marker gene lists.

This repository lists recent method papers, benchmark papers, and application papers using deconvolution algorithms, along with some of our comments.

Method papers

Method	Reference	Availability	Comments
Bisque	Jew et al. (2020), Nature Communications	R package from GitHub	Uses pseudo-bulk (adding up single cells) vs actual bulk and then does a transformation. Methods section could be improved.
CIBERSORT	Newman et al. (2015), Nature Methods	Online tool requiring signup and login	Requires input matrix of reference gene expression signatures for cell types. Methodology is based on linear support vector regression, which also performs feature selection on the genes in the signature matrix. Also returns overall p-value for null hypothesis that no cell types from the signature matrix are present. Extensive benchmarking on datasets with both simulated and experimental (e.g. flow sorted) ground truth, and comparison against several earlier methods (using microarray data only). Widely used in application papers, but not easily accessible or reproducible (only available as online tool requiring signup and login).
CIBERSORTx	Newman et al. (2019), Nature Biotechnology	Online tool requiring signup and login.	Extension of previous tool CIBERSORT. Not easily accessible or reproducible (only available as online tool requiring signup and login). Free for academic use only (not commercial), which limits usefulness.
DWLS	Tsoucas et al. (2019), Nature Communications	R package from BitBucket
EPIC	Racle et al. (2017), eLife	R package from GitHub, as well as web application
MuSiC	Wang et al. (2019), Nature Communications	R package from GitHub
Scaden	Menden et al. (2020), Science Advances	Python package from Bioconda and PyPI, as well as web application

Benchmark papers

Paper	Description
Patrick et al. (2020), PLOS Computational Biology	Benchmark of deconvolution algorithms to estimate cell type proportions in brain tissue. Experimentally generated immunohistochemistry (IHC) benchmark dataset (5 major cell types, 70 individuals, with matched bulk cortical gene expression data). Deconvolution algorithms are run using known marker gene sets. Algorithms in benchmark include: NNLS, CIBERSORT, dtangle, DSA, MuSiC, BSEQ-sc.
Cobos et al. (2020), bioRxiv
Huang et al. (2020), bioRxiv	Benchmarked 8 methods developed for single-cell RNA-seq (Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, SCINA) and 2 for DNAm (Linear Constrained Projection (CP) and Robust Partial Correlations (RPC))
Sturm et al. (2019), Bioinformatics	Benchmark with reproducible code provided as a Snakemake pipeline. Uses datasets consisting of immune cell populations in the tumor microenvironment (TME).

Application papers

Mainly from the field of ovarian cancer, since this is the application area we are working on.

Paper	Description
Hu et al. (2020), Cancer Cell	Non-genetic heterogeneity (NGH) within tumors (i.e. heterogeneity related to different cell types, as opposed to genetic heterogeneity due to mutations) is associated with tumor resilience but difficult to quantify. The authors perform scRNA-seq on 4000 normal fallopian tube epithelial (FTE) cells from 11 donors to identify 6 FTE subpopulations; FTE cells are the cells of origin for serous ovarian cancer (SOC). They then use the gene expression signatures of 5 FTE subtypes to deconvolve SOC expression data from bulk samples to identify NGH within tumors, and stratify subtypes of SOC based on NGH. Cell types within the scRNA-seq data are identified by clustering and differential expression based annotation according to known marker genes (note they used a customized clustering algorithm). Deconvolution is done using the CIBERSORT algorithm. The stratified SOC subtypes are shown to predict survival in a dataset of 1700 bulk tumor samples from The Cancer Genome Atlas (TCGA) and Australian Ovarian Cancer Study (AOCS). An open question is whether NGH is due to multiple cell types of origin for the tumor, or due to differentiation within the tumor, or a combination of both.
Schwede et al. (2020), Cancer Epidemiology, Biomarkers & Prevention	Main conclusions: Demonstrates (1) HGSOC subtype identification depends on ratio of tumor to stroma within the specimen and (2) The anatomic location of biopsy may influence the proportion of stromal involvement and potentially the resulting gene expression pattern. Also important to define the relative proportions of stromal cells and model their prognostic importance in the tumor microenvironment. Summary of data analyses: Identified "stromal gene set" and "tumor gene sets" using two datasets (AOCS n=8 and TCGA n=38 paired tumors). Found overlap in genes between two datasets (found 23 tumor genes and 125 stroma genes). Performed unsupervised clustering: Found that microdissected tumor samples cluster with bulk C4/C5 subtype and and microdissected stroma samples cluster with bulk C1/C2 subtype. Found gene signature classifiers for AOCS and TCGA molecular subtypes are not stable when the proportion of stromal cells changes. Found pathologist scores of percent of stromal content was associated with overall survival (Cox PH) in high stage (III or IV) tumors. Using n=61 published prognostic ovarian gene signatures from GeneSigDB, found 24/61 were enriched for AOCS stromal genes and 11/61 were enriched in MGH stroma genes. 8 were sets of signatures were strongly enriched in both datasets. Stroma signature is not specific to HGSOC. Also found these signatures in breast and prostate. The sampling location of tumors (extra-ovarian vs ovary/pelvis) impacts the stroma's prognostic power.

Paper

Description

Hu et al. (2020), Cancer Cell

Non-genetic heterogeneity (NGH) within tumors (i.e. heterogeneity related to different cell types, as opposed to genetic heterogeneity due to mutations) is associated with tumor resilience but difficult to quantify. The authors perform scRNA-seq on 4000 normal fallopian tube epithelial (FTE) cells from 11 donors to identify 6 FTE subpopulations; FTE cells are the cells of origin for serous ovarian cancer (SOC). They then use the gene expression signatures of 5 FTE subtypes to deconvolve SOC expression data from bulk samples to identify NGH within tumors, and stratify subtypes of SOC based on NGH. Cell types within the scRNA-seq data are identified by clustering and differential expression based annotation according to known marker genes (note they used a customized clustering algorithm). Deconvolution is done using the CIBERSORT algorithm. The stratified SOC subtypes are shown to predict survival in a dataset of 1700 bulk tumor samples from The Cancer Genome Atlas (TCGA) and Australian Ovarian Cancer Study (AOCS). An open question is whether NGH is due to multiple cell types of origin for the tumor, or due to differentiation within the tumor, or a combination of both.

Schwede et al. (2020), Cancer Epidemiology, Biomarkers & Prevention

Main conclusions: Demonstrates (1) HGSOC subtype identification depends on ratio of tumor to stroma within the specimen and (2) The anatomic location of biopsy may influence the proportion of stromal involvement and potentially the resulting gene expression pattern. Also important to define the relative proportions of stromal cells and model their prognostic importance in the tumor microenvironment. Summary of data analyses: Identified "stromal gene set" and "tumor gene sets" using two datasets (AOCS n=8 and TCGA n=38 paired tumors). Found overlap in genes between two datasets (found 23 tumor genes and 125 stroma genes). Performed unsupervised clustering: Found that microdissected tumor samples cluster with bulk C4/C5 subtype and and microdissected stroma samples cluster with bulk C1/C2 subtype. Found gene signature classifiers for AOCS and TCGA molecular subtypes are not stable when the proportion of stromal cells changes. Found pathologist scores of percent of stromal content was associated with overall survival (Cox PH) in high stage (III or IV) tumors. Using n=61 published prognostic ovarian gene signatures from GeneSigDB, found 24/61 were enriched for AOCS stromal genes and 11/61 were enriched in MGH stroma genes. 8 were sets of signatures were strongly enriched in both datasets. Stroma signature is not specific to HGSOC. Also found these signatures in breast and prostate. The sampling location of tumors (extra-ovarian vs ovary/pelvis) impacts the stroma's prognostic power.

lmweber/deconvolution-papers

deconvolution-papers

Method papers

Benchmark papers

Application papers