cacoa

Case-Control Analysis of scRNA-seq experiments

The package implements methods described in this pre-print. To reproduce results from the paper, please see this repository.

Installation

To install the latest version, use:

install.packages('devtools')
devtools::install_github('kharchenkolab/cacoa')

Prior to installing the package, dependencies have to be installed:

BiocManager::install(c("clusterProfiler", "DESeq2", "DOSE", "EnhancedVolcano", "enrichplot", "fabia", "GOfuncR", "Rgraphviz"))

Also make sure to install the latest version of sccore (not the one from CRAN):

devtools::install_github("kharchenkolab/sccore", ref="dev")

Initialization

Cacoa currently supports inputs in several formats (see below). Most of them require the following metadata:

sample.groups: vector with condition labels per sample named with sample ids
cell.groups: cell type annotation vector named by cell ids
sample.per.cell: vector with sample labels per cell named with cell ids
ref.level: id of the condition, corresponding to the reference (i.e. control)
target.level: id of the condition, corresponding to the target (i.e. case)

Additionally, embedding parameter containing a matrix or data.frame with a cell embedding can be provided. Rownames should match to the cell ids. It is used for visualization and some cluster-free analysis.

No expression data

Cacoa can be ran without any expression data by passing NULL instead of a data object:

cao <- Cacoa$new(NULL, sample.groups=sample.groups, cell.groups=cell.groups, sample.per.cell=sample.per.cell, 
                 ref.level=ref.level, target.level=target.level, embedding=embedding)

In this case, only compositional analyses will be available.

Raw or normalized joint count matrix `cm`

cao <- Cacoa$new(cm, sample.groups=sample.groups, cell.groups=cell.groups, sample.per.cell=sample.per.cell, 
                 ref.level=ref.level, target.level=target.level, embedding=embedding)

Seurat object `so`

cao <- Cacoa$new(so, sample.groups=sample.groups, cell.groups=cell.groups, sample.per.cell=sample.per.cell, 
                 ref.level=ref.level, target.level=target.level, graph.name=graph.name)

Parameter graph.name is required for cluster-free analysis, and must contain a name of joint graph in Seurat object. For that, the Seurat object must have a joint graph estimated (see FindNeighbors). For visualization purposes, Seurat also must have cell embedding estimated or the embedding data frame must be provided in the embedding parameter.

Conos object `co`

cao <- Cacoa$new(co, sample.groups=sample.groups, cell.groups=cell.groups, 
                 ref.level=ref.level, target.level=target.level)

For visualization purposes, Conos must have cell embedding estimated or the embedding data frame must be provided in the embedding parameter. And for cluster-free analysis it should have a joint graph (see the method Conos$buildGraph() from conos method).

Usage

Cacoa can estimate and visualize various statistics. Most of them have paired functions cao$estimateX(...) and cao$plotX(...) (for example, cao$estimateCellLoadings() and cao$plotCellLoadings()). Results of all estimation are stored in cao$test.results, and their exact name can be controlled by name parameter passed to cao$estimateX(). For example, calling cao$estimateExpressionShiftMagnitudes(name='es') would save the results in cao$test.results$es.

Please, see the documentation for exact functions inside the package. For a demonstration see the vignette (code). Additionally, the cacoaAnalysis repository contains analysis conducted inside the paper, though the Cacoa version there may be out of date.

Citation

If you find this pipeline useful for your research, please consider citing the pre-pring:

Case-control analysis of single-cell RNA-seq studies Viktor Petukhov, Anna Igolkina, Rasmus Rydbirk, Shenglin Mei, Lars Christoffersen, Konstantin Khodosevich, Peter V. Kharchenko bioRxiv 2022.03.15.484475; doi: https://doi.org/10.1101/2022.03.15.484475

kharchenkolab/cacoa