DTD

Digital Tissue Deconvolution (DTD) reconstructs the cellular composition of a tissue from its bulk expression profile. In order to increase deconvolution accuracy, DTD adapts the deconvolution model to the tissue scenario via loss-function learning.
Training is performed on 'in-silicio' training mixtures, for which the cellular composition are known.
As input, DTD requires a labelled expression matrix. The package includes functions to generate training and test mixtures, train the model, and assess its deconvolution capability via visualizations.
In Goertler et al. 2018 "Loss-function Learning for Digital Tissue Deconvolution" the theory has been published.

An exemplary analysis can be viewed at https://github.com/MarianSchoen/Exemplary-DTD-analysis

Install

Install from github, without vignette:

  devtools::install_github("spang-lab/DTD")

I strongly recommend creating the vignette. Therefore, install from github with vignette
(creating vignettes approximately takes ~3 minutes)

  devtools::install_github(
    "spang-lab/DTD", 
    build_opts = c("--no-resave-data", "--no-manual"), 
    build_vignettes=TRUE
    )
  browseVignettes("DTD")

Introduction to DTD Theory

The gene expression profile of a tissue combines the expression profiles of all cells in this tissue. Digital tissue deconvolution (DTD) addresses the following inverse problem: Given the expression profile y of a tissue, what is the cellular composition c of cells X in that tissue? The cellular composition c can be estimated by

Görtler et al (2019) generalized this formula by introducing a vector g

Every entry g[i] of g holds the information how important gene i is for the deconvolution process. It can either be selected via prior knowledge, or learned on training data. Training data consists of artificial bulk profiles Y, and the corresponding cellular compositions C. We generate this data with single cell RNASeq profiles.
The underlying idea of loss-function learning DTD is to gain the vector g by minimizing a loss function L on the training set:

$L = -\sum_j cor(C_{j, .}, \widehat{C_{j, .}}(g))$

Here, $\widehat{C}(g)$ is the solution of formula (2). During training we iteratively adjust the g vector in the direction of the gradient $\nabla L$ , leading to a g vector, which cellular estimates $\widehat{C}(g)$ correlate best with the known cellular compositions C.

License

All source code and documentation can be freely used and is available under a MIT license.

MarianSchoen/DTD

DTD

Install

Introduction to DTD Theory

License