infercna_light: An R repository from abikoushi

Overview

infercna aims to provide functions for inferring CNA values from scRNA-seq data and related queries.

infercna() to infer copy-number alterations from single-cell RNA-seq data
refCorrect() to convert relative CNA values to absolute values + computed in infercna() if reference cells are provided
cnaPlot() to plot a heatmap of CNA values
cnaScatterPlot() to visualise malignant and non-malignant cell subsets
cnaCor() a parameter to identify cells with high CNAs + computed in cnaScatterPlot()
cnaSignal() a second parameter to identify cells with high CNAs + computed in cnaScatterPlot()
findMalignant() to find malignant subsets of cells
findClones() to identify genetic subclones
fitBimodal() to fit a bimodal gaussian distribution + used in findMalignant() + used in findClones()
filterGenes() to filter genes by their genome features
splitGenes() to split genes by their genome features
orderGenes() to order genes by their genomic position
useGenome() to change the default genome configured with infercna
addGenome() to configure infercna with a new genome specified by the user

See Reference tab for a full list and documentation pages.

Installation

To install infercna:

# install.packages("devtools")
devtools::install_github("jlaffy/infercna")

References

The methodology behind infercna has been tried and tested in several high-impact publications. It was actually in the earliest of these papers (last listed) that the idea to infer CNAs from single-cell RNA-sequencing data was first formulated.

Data requirements

The bare minimum for use in infercna is:

a single-cell expression matrix of genes by cells
- not centered
- normalised for sequencing depth and gene length (e.g. one of TPM, RPKM, CPM, etc).
- optionally in log space. e.g. log2(TPM/10 + 1)
- Note: also see infercna::TPM and infercna::logTPM

If you would like to compute absolute (rather than relative) CNA values, you should additionally provide:

a list of length two or more containing reference cell IDs of normal cells. For example list(macrophages, oligodendrocytes).
1. see example reference infercna::refCells

Finally, if your genome is not available in the current implementation of infercna, you should additionally provide:

a genome dataframe, containing the columns: symbol, chromosome_name, start_position, arm.

Example data

infercna is built with two example datasets of scRNA-seq data from two patients with Glioblastoma, infercna::bt771 and infercna::mgh125, along with two normal reference groups, infercna::refCells. The matrices are stored as sparse matrices and you can use infercna::useData() to load them as normal matrices. These patients are taken from a much larger cohort of 28 Glioblastoma samples. You can look at the complete study here and can download the complete dataset via the Single Cell Portal.

Future implementations

Future implementations will include:

more default genomes to choose from
option to correct CNA values (to absolute) when just one reference is available.
more stuff…