simspec: Similarity Spectrum

An R package to calculate representation of cells in single-cell genomic data, by their similarities to external references (RSS) or cell clusters in the data (CSS). More details of the method are available in the paper CSS: cluster similarity spectrum integration of single-cell genomics data. The manscript is also available in biorxiv.

Recent update

(221101)

Implement estimate_projection_failure function to estimate failure likelihood of data projection to the given reference for each query cell
Add verbose messages to the transfer_labels function
Support providing cluster labels instead of doing clustering per sample from scratch in cluster_sim_spectrum
Update verbose message

(220622)

Add min_cluster_num parameter to cluster_sim_spectrum function to exclude samples with too few clusters from the ref profiles
Support ref_sim_spectrum function to output as a new assay in the Seurat object
Update verbose message

(211124)

Sparse matrix ranking for Spearman correlation coefficient to speed up calculation and avoid conversion to dense matrix
Faster kNN-based label projection

Installation

install.packages("devtools")
devtools::install_github("quadbiolab/simspec")

Usage

The more detailed vignette can be seen in https://github.com/quadbiolab/simspec/blob/master/vignette/vignette.md.

The codes to generate resulted reported in the paper are deposited in https://github.com/quadbiolab/simspec/blob/master/code_repository/. Data can be retrieved from Mendeley Data (http://doi.org/10.17632/3kthhpw2pd).

Reference Similarity Spectrum (RSS)

To calculate RSS, two inputs are required

Expression matrix of the data (expr)
Expression matrix of the reference (ref)

RSS <- ref_sim_spectrum(expr, ref)

A Seurat object can also be the input. In that case, an updated Seurat object with additional dimension reduction ('rss' by default) is returned

seurat <- ref_sim_spectrum(seurat, ref)
seurat <- RunUMAP(seurat, reduction = "rss", dims = 1:ncol(Embeddings(seurat, "rss")))
seurat <- FindNeighbors(seurat, reduction = "rss", dims = 1:ncol(Embeddings(seurat, "rss")))
seurat <- FindClusters(seurat)
UMAPPlot(seurat)

Cluster Similarity Spectrum (CSS)

To calculate CSS, two inputs are required

Expression matrix of the data (expr)
Labels indicating samples (labels)

CSS <- cluster_sim_spectrum(expr, labels = labels)

Similarly, a Seurat object can be the input. When a Seurat object is used, the name of a column in the meta.data, which shows labels of samples, should be provided. Note: the Seurat object is expected to have variable features defined and PCA run

seurat <- cluster_sim_spectrum(seurat, label_tag = "sample")
seurat <- RunUMAP(seurat, reduction = "css", dims = 1:ncol(Embeddings(seurat, "css")))
seurat <- FindNeighbors(seurat, reduction = "css", dims = 1:ncol(Embeddings(seurat, "css")))
seurat <- FindClusters(seurat)
UMAPPlot(seurat)

CSS for query data projection

CSS representation allows simple and straightforward projection of query data to a reference atlas. To do that, the CSS representation model of the reference data needs to be returned.

model <- cluster_sim_spectrum(expr_ref, labels = labels_ref, return_css_only = F)
model <- cluster_sim_spectrum(seurat_ref, label_tag = "sample", return_seuratObj = F)

The model is then used to project query data to the same CSS space

css_query <- css_project(expr_query, model)
seurat_query <- css_project(seurat_query, model)