Case-Control Analysis of scRNA-seq experiments
The package implements methods described in this pre-print. To reproduce results from the paper, please see this repository.
To install the latest version, use:
install.packages('devtools')
devtools::install_github('kharchenkolab/cacoa')
Prior to installing the package, dependencies have to be installed:
BiocManager::install(c("clusterProfiler", "DESeq2", "DOSE", "EnhancedVolcano", "enrichplot", "fabia", "GOfuncR", "Rgraphviz"))
Also make sure to install the latest version of sccore (not the one from CRAN):
devtools::install_github("kharchenkolab/sccore", ref="dev")
Cacoa currently supports inputs in several formats (see below). Most of them require the following metadata:
sample.groups
: vector with condition labels per sample named with sample idscell.groups
: cell type annotation vector named by cell idssample.per.cell
: vector with sample labels per cell named with cell idsref.level
: id of the condition, corresponding to the reference (i.e. control)target.level
: id of the condition, corresponding to the target (i.e. case)
Additionally, embedding
parameter containing a matrix or data.frame with a cell embedding can be provided. Rownames should match to the cell ids.
It is used for visualization and some cluster-free analysis.
Cacoa can be ran without any expression data by passing NULL
instead of a data object:
cao <- Cacoa$new(NULL, sample.groups=sample.groups, cell.groups=cell.groups, sample.per.cell=sample.per.cell,
ref.level=ref.level, target.level=target.level, embedding=embedding)
In this case, only compositional analyses will be available.
cao <- Cacoa$new(cm, sample.groups=sample.groups, cell.groups=cell.groups, sample.per.cell=sample.per.cell,
ref.level=ref.level, target.level=target.level, embedding=embedding)
cao <- Cacoa$new(so, sample.groups=sample.groups, cell.groups=cell.groups, sample.per.cell=sample.per.cell,
ref.level=ref.level, target.level=target.level, graph.name=graph.name)
Parameter graph.name
is required for cluster-free analysis, and must contain a name of joint graph in Seurat object. For that, the Seurat object must have a joint graph estimated (see FindNeighbors). For visualization purposes, Seurat also must have cell embedding estimated or the embedding data frame must be provided in the embedding
parameter.
cao <- Cacoa$new(co, sample.groups=sample.groups, cell.groups=cell.groups,
ref.level=ref.level, target.level=target.level)
For visualization purposes, Conos must have cell embedding estimated or the embedding data frame must be provided in the embedding
parameter. And for cluster-free analysis it should have a joint graph (see the method Conos$buildGraph()
from conos method).
Cacoa can estimate and visualize various statistics. Most of them have paired functions cao$estimateX(...)
and cao$plotX(...)
(for example, cao$estimateCellLoadings()
and cao$plotCellLoadings()
). Results of all estimation are stored in cao$test.results
, and their exact name can be controlled by name
parameter passed to cao$estimateX()
. For example, calling cao$estimateExpressionShiftMagnitudes(name='es')
would save the results in cao$test.results$es
.
Please, see the documentation for exact functions inside the package. For a demonstration see the vignette (code). Additionally, the cacoaAnalysis repository contains analysis conducted inside the paper, though the Cacoa version there may be out of date.
If you find this pipeline useful for your research, please consider citing the pre-pring:
Case-control analysis of single-cell RNA-seq studies Viktor Petukhov, Anna Igolkina, Rasmus Rydbirk, Shenglin Mei, Lars Christoffersen, Konstantin Khodosevich, Peter V. Kharchenko bioRxiv 2022.03.15.484475; doi: https://doi.org/10.1101/2022.03.15.484475