/infercna_light

Infer Copy Number Alterations and Clonality in (Single-Cell) RNA-Seq Data (light ver.)

Primary LanguageRGNU General Public License v3.0GPL-3.0

Overview

infercna aims to provide functions for inferring CNA values from scRNA-seq data and related queries.

  • infercna() to infer copy-number alterations from single-cell RNA-seq data
  • refCorrect() to convert relative CNA values to absolute values + computed in infercna() if reference cells are provided
  • cnaPlot() to plot a heatmap of CNA values
  • cnaScatterPlot() to visualise malignant and non-malignant cell subsets
  • cnaCor() a parameter to identify cells with high CNAs + computed in cnaScatterPlot()
  • cnaSignal() a second parameter to identify cells with high CNAs + computed in cnaScatterPlot()
  • findMalignant() to find malignant subsets of cells
  • findClones() to identify genetic subclones
  • fitBimodal() to fit a bimodal gaussian distribution + used in findMalignant() + used in findClones()
  • filterGenes() to filter genes by their genome features
  • splitGenes() to split genes by their genome features
  • orderGenes() to order genes by their genomic position
  • useGenome() to change the default genome configured with infercna
  • addGenome() to configure infercna with a new genome specified by the user

See Reference tab for a full list and documentation pages.

Installation

To install infercna:

# install.packages("devtools")
devtools::install_github("jlaffy/infercna")

References

The methodology behind infercna has been tried and tested in several high-impact publications. It was actually in the earliest of these papers (last listed) that the idea to infer CNAs from single-cell RNA-sequencing data was first formulated.

Data requirements

The bare minimum for use in infercna is:

  • a single-cell expression matrix of genes by cells
    • not centered
    • normalised for sequencing depth and gene length (e.g. one of TPM, RPKM, CPM, etc).
    • optionally in log space. e.g. log2(TPM/10 + 1)
    • Note: also see infercna::TPM and infercna::logTPM

If you would like to compute absolute (rather than relative) CNA values, you should additionally provide:

  • a list of length two or more containing reference cell IDs of normal cells. For example list(macrophages, oligodendrocytes).
    1. see example reference infercna::refCells

Finally, if your genome is not available in the current implementation of infercna, you should additionally provide:

  • a genome dataframe, containing the columns: symbol, chromosome_name, start_position, arm.

Example data

infercna is built with two example datasets of scRNA-seq data from two patients with Glioblastoma, infercna::bt771 and infercna::mgh125, along with two normal reference groups, infercna::refCells. The matrices are stored as sparse matrices and you can use infercna::useData() to load them as normal matrices. These patients are taken from a much larger cohort of 28 Glioblastoma samples. You can look at the complete study here and can download the complete dataset via the Single Cell Portal.

Future implementations

Future implementations will include:

  • more default genomes to choose from
  • option to correct CNA values (to absolute) when just one reference is available.
  • more stuff…