infercna aims to provide functions for inferring CNA values from scRNA-seq data and related queries.
infercna()
to infer copy-number alterations from single-cell RNA-seq datarefCorrect()
to convert relative CNA values to absolute values + computed ininfercna()
if reference cells are providedcnaPlot()
to plot a heatmap of CNA valuescnaScatterPlot()
to visualise malignant and non-malignant cell subsetscnaCor()
a parameter to identify cells with high CNAs + computed incnaScatterPlot()
cnaSignal()
a second parameter to identify cells with high CNAs + computed incnaScatterPlot()
findMalignant()
to find malignant subsets of cellsfindClones()
to identify genetic subclonesfitBimodal()
to fit a bimodal gaussian distribution + used infindMalignant()
+ used infindClones()
filterGenes()
to filter genes by their genome featuressplitGenes()
to split genes by their genome featuresorderGenes()
to order genes by their genomic positionuseGenome()
to change the default genome configured with infercnaaddGenome()
to configure infercna with a new genome specified by the user
See Reference tab for a full list and documentation pages.
To install infercna
:
# install.packages("devtools")
devtools::install_github("jlaffy/infercna")
The methodology behind infercna has been tried and tested in several high-impact publications. It was actually in the earliest of these papers (last listed) that the idea to infer CNAs from single-cell RNA-sequencing data was first formulated.
The bare minimum for use in infercna is:
- a single-cell expression matrix of genes by cells
- not centered
- normalised for sequencing depth and gene length (e.g. one of TPM, RPKM, CPM, etc).
- optionally in log space. e.g.
log2(TPM/10 + 1)
- Note: also see
infercna::TPM
andinfercna::logTPM
If you would like to compute absolute (rather than relative) CNA values, you should additionally provide:
- a list of length two or more containing reference cell IDs of normal
cells. For example list(macrophages, oligodendrocytes).
- see example reference
infercna::refCells
- see example reference
Finally, if your genome is not available in the current implementation of infercna, you should additionally provide:
- a genome dataframe, containing the columns:
symbol
,chromosome_name
,start_position
,arm
.
infercna is built with two example datasets of scRNA-seq data from two
patients with Glioblastoma, infercna::bt771
and infercna::mgh125
,
along with two normal reference groups, infercna::refCells
. The
matrices are stored as sparse matrices and you can use
infercna::useData()
to load them as normal matrices. These patients
are taken from a much larger cohort of 28 Glioblastoma samples. You can
look at the complete study
here and can download the
complete dataset via the Single Cell
Portal.
Future implementations will include:
- more default genomes to choose from
- option to correct CNA values (to absolute) when just one reference is available.
- more stuff…