/DoubletDetection

Doublet detection in single-cell RNA-seq data (R package).

Primary LanguageRMIT LicenseMIT

Notice

This repository is still in development

The testing of R implementation of functions provided in the Python version is ongoing. This package is not fully documented yet. Please share any problems you encounter in the documentation or functionality of the code as an issue on GitHub. Thank you for your patience.

Contributions to the R implementation via pull request are welcome.

DoubletDetection

Version 2.3.0

CRAN_Status_Badge Travis Build Status AppVeyor Build Status Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. codecov

DoubletDetection is an R implementation of a package to detect doublets (technical errors) in single-cell RNA-seq count matrices.

Installation

To install DoubletDetection in R:

if(!require(devtools)){
  install.packages("devtools") # If not already installed
}
devtools::install_github("TomKellyGenetics/DoubletDetection", ref = "r-implementation")

Running

To run basic doublet classification:

library("DoubletDetection")
clf <- BoostClassifier$new()
# raw_counts is a genes by cells count matrix
labels = clf$fit(raw_counts)$predict()

raw_counts is a scRNA-seq count matrix or data.frame (genes x cells). Note that the dimensions of input matrix differs from the Python version.

labels is a binary numerical vector with the value 1 representing a detected doublet, 0 a singlet, and NA an ambiguous cell.

The classifier works best when there are several cell types present in the data. Furthermore, it should be applied individually to each run in an aggregated count matrix.

Usage

R Version

These functions and methods (for the Reference Class) have been documented and can be accessed in the R help system. A vignette will be prepared using Rmarkdown in due course.

Python Version

For the Python implementation, see the original repository: https://github.com/JonathanShor/DoubletDetection

See their jupyter notebook for an example on 8k PBMCs from 10x.

Obtaining data

Data can be downloaded from the 10x website.

Citations

R version

Please cite the R implementation as an R package using citation(DoubletDetection).

Adam J. Gayoso, Jonathan D. Shor, Ambrose J. Carr, and S. Thomas Kelly (2018). DoubletDetection: a package to detect doublets (technical errors) in single-cell RNA-seq count matrices. R package version 2.3.0 https://github.com/TomKellyGenetics/DoubletDetection"

Python version

Please acknowledge the original contributors when using the R implementation.

bioRxiv submission is in progress. Please refer to the Python Repository https://github.com/JonathanShor/DoubletDetection) for more details.

This project is licensed under the terms of the MIT license (in accordance with the license of the original repository).