/awesome-single-cell

List of software packages for single-cell data analysis, including RNA-seq, ATAC-seq, etc.

Primary LanguageRMIT LicenseMIT

awesome-single-cell

List of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc. Contributions welcome...

Citation

DOI

Software packages

RNA-seq

  • anchor - [Python] - ⚓ Find bimodal, unimodal, and multimodal features in your data
  • ascend - [R] - ascend is an R package comprised of fast, streamlined analysis functions optimized to address the statistical challenges of single cell RNA-seq. The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting.
  • BackSPIN - [Python] - Biclustering algorithm developed taking into account intrinsic features of single-cell RNA-seq experiments.
  • BASiCS - [R] - Bayesian Analysis of single-cell RNA-seq data. Estimates cell-specific normalization constants. Technical variability is quantified based on spike-in genes. The total variability of the expression counts is decomposed into technical and biological components. BASiCS can also identify genes with differential expression/over-dispersion between two or more groups of cells.
  • BatchEffectRemoval - [Python] - Removal of Batch Effects using Distribution-Matching Residual Networks
  • BEARscc - [R] - BEARscc makes use of ERCC spike-in measurements to model technical variance as a function of gene expression and technical dropout effects on lowly expressed genes.
  • bonvoyage - [Python] - 📐 Transform percentage-based units into a 2d space to evaluate changes in distribution with both magnitude and direction.
  • BPSC - [R] - Beta-Poisson model for single-cell RNA-seq data analyses
  • CellCNN - [Python] - Representation Learning for detection of phenotype-associated cell subsets
  • Cellity - [R] - Classification of low quality cells in scRNA-seq data using R
  • cellTree - [R] - Cell population analysis and visualization from single cell RNA-seq data using a Latent Dirichlet Allocation model.
  • clusterExperiment - [R] - Functions for running and comparing many different clusterings of single-cell sequencing data. Meant to work with SCONE and slingshot.
  • CytoGuide - [C++,D3] - CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis
  • DECENT - [R] - The unique features of scRNA-seq data have led to the development of novel methods for differential expression (DE) analysis. However, few of the existing DE methods for scRNA-seq data estimate the number of molecules pre-dropout and therefore do not explicitly distinguish technical and biological zeroes. We develop DECENT, a DE method for scRNA-seq data that adjusts for the imperfect capture efficiency by estimating the number of molecules pre-dropout.
  • DECODE - [ ] - We develop an algorithm, called DECODE, to assess the extent of joint presence/absence of genes across different cells. We show that this network captures biologically-meaningful pathways, cell-type specific modules, and connectivity patterns characteristic of complex networks. We develop a model that uses this network to discriminate biological vs. technical zeros, by exploiting each gene's local neighborhood. For non-biological zeros, we build a predictive model to impute the missing value using their most informative neighbors.
  • DESCEND - [R] - DESCEND deconvolves the true gene expression distribution across cells for UMI scRNA-seq counts. It provides estimates of several distribution based statistics (five distribution measurements and the coefficients of covariates (such as batches or cell size)).
  • destiny - [R] - Diffusion maps are spectral method for non-linear dimension reduction introduced by Coifman et al.(2005). Diffusion maps are based on a distance metric (diffusion distance) which is conceptually relevant to how differentiating cells follow noisy diffusion-like dynamics, moving from a pluripotent state towards more differentiated states.
  • DeLorean - [R] - Bayesian pseudotime estimation algorithm that uses Gaussian processes to model gene expression profiles and provides a full posterior for the pseudotimes.
  • dropClust - [R/Python] - Efficient clustering of ultra-large scRNA-seq data.
  • dropsim - [R] - Simulating droplet based scRNA-seq data.
  • ECLAIR - [python] - ECLAIR stands for Ensemble Clustering for Lineage Analysis, Inference and Robustness. Robust and scalable inference of cell lineages from gene expression data.
  • embeddr - [R] - Embeddr creates a reduced dimensional representation of the gene space using a high-variance gene correlation graph and laplacian eigenmaps. It then fits a smooth pseudotime trajectory using principal curves.
  • Falco - [AWS cloud] - Falco: A quick and flexible single-cell RNA-seq processing framework on the cloud.
  • FastProject - [Python] - Signature analysis on low-dimensional projections of single-cell expression data.
  • flotilla - [Python] - Reproducible machine learning analysis of gene expression and alternative splicing data
  • GPfates - [Python] - Model transcriptional cell fates as mixtures of Gaussian Processes
  • GiniClust - [Python/R] - GiniClust is a clustering method implemented in Python and R for detecting rare cell-types from large-scale single-cell gene expression data. GiniClust can be applied to datasets originating from different platforms, such as multiplex qPCR data, traditional single-cell RNAseq or newly emerging UMI-based single-cell RNAseq, e.g. inDrops and Drop-seq.
  • HocusPocus - [R] - Basic PCA-based workflow for analysis and plotting of single cell RNA-seq data.
  • ICGS - [Python] - Iterative Clustering and Guide-gene Selection (Olsson et al. Nature 2016). Identify discrete, transitional and mixed-lineage states from diverse single-cell transcriptomics platforms. Integrated FASTQ pseudoalignment /quantification (Kallisto), differential expression, cell-type prediction and optional cell cycle exclusion analyses. Specialized methods for processing BAM and 10X Genomics spares matrix files. Associated single-cell splicing PSI methods (MultIPath-PSI). Apart of the AltAnalyze toolkit along with accompanying visualization methods (e.g., heatmap, t-SNE, SashimiPlots, network graphs). Easy-to-use graphical user and commandline interfaces.
  • knn-smoothing - [python or R or matlab] - The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on variance-stabilized and partially smoothed expression profiles, and then aggregating their transcript counts.
  • MAGIC - [python or matlab] - Markov Affinity-based Graph Imputation of Cells (MAGIC).
  • MAST - [R] - Model-based Analysis of Single-cell Transcriptomics (MAST) fits a two-part, generalized linear models that are specially adapted for bimodal and/or zero-inflated single cell gene expression data.
  • mfa - [R] - Bayesian modelling of bifurcations using a mixture of factor analysers
  • K-Branches - [R] - The main idea behind the K-Branches method is to identify regions of interest (branching regions and tips) in differentiation trajectories of single cells. So far, K-Branches is intended to be used on the diffusion map representation of the data, so the user should either provide the data in diffusion map space or use the destiny package perform diffusion map dimensionality reduction.
  • M3Drop - [R] - Michaelis-Menten Modelling of Dropouts for scRNASeq.
  • MAST - [R] - Model-based Analysis of Single-cell Transcriptomics (MAST) fits a two-part, generalized linear models that are specially adapted for bimodal and/or zero-inflated single cell gene expression data
  • MIMOSCA - [python] - A repository for the design and analysis of pooled single cell RNA-seq perturbation experiments (Perturb-seq).
  • Monocle - [R] - Differential expression and time-series analysis for single-cell RNA-Seq.
  • netSmooth - [R] - netSmooth is a network-diffusion based method that uses priors for the covariance structure of gene expression profiles on scRNA-seq experiments in order to smooth expression values. We demonstrate that netSmooth improves clustering results of scRNA-seq experiments from distinct cell populations, time-course experiments, and cancer genomics.
  • NetworkInference - [Julia] - Fast implementation of single-cell network inference algorithms: Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures
  • nimfa - [Python] - Nimfa is a Python scripting library which includes a number of published matrix factorization algorithms, initialization methods, quality and performance measures and facilitates the combination of these to produce new strategies. The library represents a unified and efficient interface to matrix factorization algorithms and methods.
  • OEFinder - [R] - Identify ordering effect genes in single cell RNA-seq data. OEFinder shiny impelemention depends on packages shiny, shinyFiles, gdata, and EBSeq.
  • OncoNEM - [R] - OncoNEM is a probabilistic method for inferring intra-tumor evolutionarylineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellularsubpopulations and infers their genotypes as well as a tree describing their evolutionary relationships.
  • Ouija - [R] - Incorporate prior information into single-cell trajectory (pseudotime) analyses using Bayesian nonlinear factor analysis.
  • outrigger - [Python] - Outrigger is a program to calculate alternative splicing scores of RNA-Seq data based on junction reads and a de novo, custom annotation created with a graph database, especially made for single-cell analyses.
  • pcaReduce - [R] - hierarchical clustering of single cell transcriptional profiles.
  • PhenoPath - [R] - Single-cell pseudotime with heterogeneous genetic and environmental backgrounds, including Bayesian significance testing of iteractions.
  • PoissonUMIs - [R] - Poisson Modelling of scRNASeq UMI counts.
  • SAVER - [R] - SAVER (Single-cell Analysis Via Expression Recovery) implements a regularized regression prediction and empirical Bayes method to recover the true gene expression profile in noisy and sparse single-cell RNA-seq data.
  • SAKE - [R] - Single-cell RNA-Seq Analysis and Clustering Evaluation.
  • SC3 - [R] - SC3 is a tool for the unsupervised clustering of cells from single cell RNA-Seq experiments.
  • Scanpy - [Py] - Scanpy provides computationally efficient tools that scale up to very large data sets and enables simple integraton of advanced machine learning algorithms.
  • scater - [R] - Scater places an emphasis on tools for quality control, visualisation and pre-processing of data before further downstream analysis, filling a useful niche between raw RNA-sequencing count or transcripts-per-million data and more focused downstream modelling tools such as monocle, scLVM, SCDE, edgeR, limma and so on.
  • scDD - [R] - scDD (Single-Cell Differential Distributions) is a framework to identify genes with different expression patterns between biological groups of interest. In addition to traditional differential expression, it can detect differences that are more complex and subtle than a mean shift.
  • SCDE - [R] - Differential expression using error models and overdispersion-based identification of important gene sets.
  • SCell - [matlab] - SCell is an integrated software tool for quality filtering, normalization, feature selection, iterative dimensionality reduction, clustering and the estimation of gene-expression gradients from large ensembles of single-cell RNA-seq datasets. SCell is open source, and implemented with an intuitive graphical interface.
  • SCIMITAR - [Python] - Single Cell Inference of Morphing Trajectories and their Associated Regulation module (SCIMITAR) is a method for inferring biological properties from a pseudotemporal ordering. It can also be used to obtain progression-associated genes that vary along the trajectory, and genes that change their correlation structure over the trajectory; progression co-associated genes.
  • scImput - [R] - scImpute: Accurate And Robust Imputation For Single Cell RNA-Seq Data
  • SCINIC - [R] - SCENIC: single-cell regulatory network inference and clustering
  • scvis - [python] - Interpretable dimensionality reduction of single cell transcriptome data with deep generative models
  • scLVM - [R] - scLVM is a modelling framework for single-cell RNA-seq data that can be used to dissect the observed heterogeneity into different sources, thereby allowing for the correction of confounding sources of variation. scLVM was primarily designed to account for cell-cycle induced variations in single-cell RNA-seq data where cell cycle is the primary soure of variability.
  • scTDA - [Python] - scTDA is an object oriented python library for topological data analysis of high-throughput single-cell RNA-seq data. It includes tools for the preprocessing, analysis, and exploration of single-cell RNA-seq data based on topological representations.
  • scmap - [R] - scmap is a method for projecting cells from a scRNA-seq experiment on to the cell-types identified in a different experiment.
  • SCnorm - [R] - A quantile regression based approach for robust normalization of single cell RNA-seq data.
  • SCONE - [R] - SCONE (Single-Cell Overview of Normalized Expression), a package for single-cell RNA-seq data quality control (QC) and normalization. This data-driven framework uses summaries of expression data to assess the efficacy of normalization workflows.
  • SCORPIUS - [R] - SCORPIUS an unsupervised approach for inferring developmental chronologies from single-cell RNA sequencing data. It accurately reconstructs trajectories for a wide variety of dynamic cellular processes. The performance was evaluated using a new, quantitative evaluation pipeline, comparing the performance of current state-of-the-art techniques on 10 publicly available single-cell RNA sequencing datasets. It automatically identifies marker genes, speeding up knowledge discovery.
  • SCOUP - [C++] - Uses probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation.
  • scran - [R] - This package implements a variety of low-level analyses of single-cell RNA-seq data. Methods are provided for normalization of cell-specific biases, pool-based norms to estimate size factors, assignment of cell cycle phase, and detection of highly variable and significantly correlated genes.
  • scTCRseq - [python] - Map T-cell receptor (TCR) repertoires from single cell RNAseq.
  • SCUBA - [matlab/R] - SCUBA stands for "Single-cell Clustering Using Bifurcation Analysis." SCUBA is a novel computational method for extracting lineage relationships from single-cell gene expression data, and modeling the dynamic changes associated with cell differentiation.
  • SEPA - [R] - SEPA provides convenient functions for users to assign genes into different gene expression patterns such as constant, monotone increasing and increasing then decreasing. SEPA then performs GO enrichment analysis to analysis the functional roles of genes with same or similar patterns.
  • Seurat - [R] - It contains easy-to-use implementations of commonly used analytical techniques, including the identification of highly variable genes, dimensionality reduction (PCA, ICA, t-SNE), standard unsupervised clustering algorithms (density clustering, hierarchical clustering, k-means), and the discovery of differentially expressed genes and markers.
  • SIMLR - [R, matlab] - SIMLR (Single-cell Interpretation via Multi-kernel LeaRning) learns an appropriate distance metric from the data for dimension reduction, clustering and visualization. SIMLR is capable of separating known subpopulations more accurately in single-cell data sets than do existing dimension reduction methods.
  • sincell - [R] - Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework.
  • sincera - [R] - R-based pipeline for single-cell analysis including clustering and visualization.
  • SingleSplice - [R, perl, C++] - A tool for detecting biological variation in alternative splicing within a population of single cells. See Welch et al. 2016.
  • singlet - [Python] - Single cell RNA-Seq analysis with phenotypes.
  • SinQC - [R] - A Method and Tool to Control Single-cell RNA-seq Data Quality.
  • SLICER - [R] - Selective Locally linear Inference of Cellular Expression Relationships (SLICER) algorithm for inferring cell trajectories.
  • slingshot - [R] - Functions for identifying and characterizing continuous developmental trajectories in single-cell sequencing data.
  • SPADE - [R] - Visualization and cellular hierarchy inference of single-cell data using SPADE.
  • splatter - [R] - Splatter is a package for the simulation of single-cell RNA sequencing count data. It provides a simple interface for creating complex simulations that are reproducible and well-documented.
  • SPRING - [matlab, javascript, python] - SPRING is a collection of pre-processing scripts and a web browser-based tool for visualizing and interacting with high dimensional data. SPRING was developed for single cell RNA-Seq data but can be applied more generally.
  • switchde - [R] - Differential expression analysis across pseudotime. Identify genes that exhibit switch-like up or down regulation along single-cell trajectories along with where in the trajectory the regulation occurs.
  • TASC - [C++, python] - To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences.
  • TASIC - [matlab] - TASIC is a new method for determining temporal trajectories, branching and cell assignments in single cell time series experiments. Unlike prior approaches TASIC uses on a probabilistic graphical model to integrate expression and time information making it more robust to noise and stochastic variations.
  • TopSLAM - [python] - Extracting and using probabilistic Waddington's landscape recreation from single cell gene expression measurements.
  • TraCeR - [python] - Reconstruction of T-Cell receptor sequences from single-cell RNA-seq data.
  • TRAPeS - [python, C++] - TRAPeS (TCR Reconstruction Algorithm for Paired-End Single-cell), a software for reconstruction of T cell receptors (TCR) using short, paired-end single-cell RNA-sequencing.
  • TSCAN - [R] - Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.
  • ZIFA - [Python] - Zero-inflated dimensionality reduction algorithm for single-cell data.
  • zUMIs - [R, perl, shell] - zUMIs: A fast and flexible pipeline to process RNA-seq data with UMIs.

Copy number analysis

  • aneufinder - [R] - Bioconductor module for copy-number detection in single-cell whole genome sequencing (scWGS) and strand-seq data using a Hidden Markov Model or binary bisection method.
  • Ginkgo - [R, C] - Ginkgo is a web application for single-cell copy-number variation analysis.

Variant calling

  • monovar - [python] - Monovar is a single nucleotide variant (SNV) detection and genotyping algorithm for single-cell DNA sequencing data. It takes a list of bam files as input and outputs a vcf file containing the detected SNVs.
  • SSrGE - [python] - SSrGE is an approach to identify SNVs correlated with Gene Expression using multiple regularized linear regressions. It contains its own pipeline to infer SNVs from scRNA-seq reads and is able to identify and sort genes and SNVs for a given cell subgroup. Deposited in BioRvix in December 2016.

Epigenomics

  • DeepCpg - [python] - DeepCpG is a deep neural network for predicting the methylation state of CpG dinucleotides in multiple cells. It allows to accurately impute incomplete DNA methylation profiles, to discover predictive sequence motifs, and to quantify the effect of sequence mutations.
  • SCRAT - [R] - SCRAT provides essential tools for users to read in single-cell regolome data (ChIP-seq, ATAC-seq, DNase-seq) and summarize into different types of features. It also allows users to visualize the features, cluster samples and identify key features.

Multi-assay data integration

Other applications

Tutorials and workflows

Web portals and apps

  • 10X Genomics datasets - 10x genomics public datasets, including 1.3M cell mouse brain dataset.
  • ASAP - Automated Single-cell Analysis Pipeline (deposited in BioRXiv on December 22, 2016).
  • conquer - A repository of consistently processed, analysis-ready single-cell RNA-seq data sets.
  • D3E - Discrete Distributional Differential Expression (D3E) is a tool for identifying differentially-expressed genes, based on single-cell RNA-seq data.
  • Ginkgo - [R, C] - Ginkgo is a web application for single-cell copy-number variation analysis and visualization.
  • Granatum - Granatum 🍇 is a graphical single-cell RNA-seq (scRNA-seq) analysis pipeline for genomics scientists. Deposited in Feb. 2017.
  • JingleBells - A repository of standardized single cell RNA-Seq datasets for analysis and visualization in IGV at the single cell level. Currently focused on immune cells (http://www.jimmunol.org/content/198/9/3375.long).
  • Single Cell Portal - The Single-Cell Portal was developed to facilitate open data and open science in Single-cell Genomics. The portal currently focuses on sharing scientific results interactively, and sharing associated datasets.
  • SAKE - Single-cell RNA-Seq Analysis and Clustering Evaluation.
  • scmap - A web tool for fast and accurate mapping of cells to a reference database using scRNA-seq data
  • scRNA.seq.datasets - Collection of public scRNA-Seq datasets used by Hemberg Lab
  • scRNASeqDB - A database aggregating human single-cell RNA-seq datasets. ref
  • SCPortalen - SCPortalen: human and mouse single-cell centric database. ref

Journal articles of general interest

Paper collections

Big data approach overview

Experimental design

Methods comparisons

Similar lists and collections

People

Gender bias at conferences is a well known problem (http://www.sciencemag.org/careers/2015/07/countering-gender-bias-conferences). Creating a list of potential speakers can help mitigate this bias and a community of people developing and maintaining helps to further diversify this list beyond smaller networks.

Female

Male