/singlet

Single cell RNA Seq data analysis with quantitative phenotypes

Primary LanguagePythonMIT LicenseMIT

Build Status Documentation Status Coverage Status License: MIT ReleaseVersion

Logo

Singlet

Single cell RNA-Seq analysis with quantitative phenotypes.

Examples:

Documentation

Hosted on readthedocs.

Features

The vision is to let you explore your data your way while providing support for repetitive tasks. Here a few things I do pretty regularly:

  • quality control and filtering
  • sample and feature filtering (e.g. querying by quantitative phenotypes in certain ranges)
  • dataset splitting (e.g. by metadata) and merging
  • bootstrapping
  • normalization
  • log/unlog transform
  • summary statistics (mean expression, std, cv, fano index)
  • feature selection
  • clustering (e.g. k-means, affinity propagation)
  • dimensionality reduction and feature weighting including phenotypes (e.g. PCA, tSNE, umap, SAM)
  • k nearest neighbors (knn) graphs
  • plotting dimensionality reductions colored by categorical or quantitative metadata
  • plotting hierarchical clustering
  • correlations of gene expression to gene expression or to quantitative phenotypes
  • differential expression at the distribution level (e.g. Mann-Whitney test)
  • load/write to loom files
  • support for custom plugins to expand the list of features at runtime

Requirements

Python 3.5+ is required. Moreover, you will need:

Optional dependencies:

Get those from your Linux distribution, pip, conda, or any other source.

Singlet is pure Python for the time being. So it should work on any platform supported by its dependencies, in particular various Linux distributions, recent-ish OSX, and Windows. It is tested on Linux and OSX, but if you are a Windows user and know how to use AppVeyor let's set it up!

Install

To get the latest stable version, use pip:

pip install singlet

To get the latest development version, clone the git repo and then call:

python3 setup.py install

Usage example

You can have a look inside the test folder for examples. To start using the example dataset:

  • Set the environment variable SINGLET_CONFIG_FILENAME to the location of the example YAML file
  • Open a Python/IPython shell or a Jupyter notebook and type:
import matplotlib.pyplot as plt
from singlet.dataset import Dataset
ds = Dataset(
    samplesheet='example_PBMC2',
    counts_table='example_PBMC2',
    featuresheet='example_PBMC2',
    )
ds.counts.log(inplace=True)
ds.samplesheet['cluster'] = ds.cluster.kmeans(axis='samples', n_clusters=5)
vs = ds.dimensionality.tsne(perplexity=15)
ax = ds.plot.scatter_reduced_samples(
    vs,
    color_by='cellType',
    figsize=(5, 4),    
    )
plt.show()

This will calculate a t-SNE embedding of the log-transformed features and then show your samples in the reduced space, colored by cluster. It should look more or less like this:

t-SNE example

Similar packages

Singlet is similar to other packages like scanpy or seurat. However, there are differences too:

  • scanpy focuses on huge datasets and graphical methods. Singlet is not opinionated about graphs and works best with smaller datasets that include quantitative phenotypes (e.g. single cell size)
  • seurat focuses on emanating a simple user experience. Singlet does try to take over repetitive tasks (e.g. data filtering) but refuses to perform strongly opinionated operations without explicit user consent (e.g. normalization using a particular statistical model).
  • singlet tries to use object oriented programming to keep clean interfaces and has an open plugin structure.