dreval
is an R package aimed at evaluation and comparison of reduced dimension
representations of high-dimensional data. Given one or more reduced dimension
representations, and a "reference" representation (which can be the original,
high-dimensional representation or a baseline low-dimensional one), dreval
will calculate a collection of metrics quantifying how well each of the
evaluated representations recapitulates the structure of the observations in the
reference representation.
To install dreval
, you need the remotes
(or devtools
) R package, which can
be installed from CRAN. The following commands installs first remotes
, then
dreval
.
install.packages("remotes")
remotes::install_github("csoneson/dreval")
The input to dreval
is a
SingleCellExperiment
object, containing one or more assays and one or more reduced dimension
representation. By default, the logcounts
assay will be used as the reference
representation, against which each of the provided reduced dimension
representations will be evaluated. However, any other assay or reduced dimension
representation can be used as the reference data, by setting the arguments to
the dreval()
function accordingly.
The package contains a small example single-cell RNA-seq data set with
measurements for approximately 1,800 highly variable genes across 2,700 PBMCs.
The object contains eight reduced dimension representations: 25-dimensional PCA,
2-dimensional PCA, and 2-dimensional t-SNE and 2-dimensional UMAP
representations generated with different values of the perplexity/number of
nearest neighbors. We use the dreval()
function to evaluate how well each of
these retain the structure of the cells based on the logcounts
assay.
data(pbmc3ksub)
dre <- dreval(sce = pbmc3ksub, refType = "assay",
refAssay = "logcounts", nSamples = 1000, kTM = 50)
For detailed information about the arguments to dreval()
we refer to the
help page of the function:
?dreval
The output of dreval()
is a list with two elements, named scores
and
plots
. The scores
element is a data.frame with all the calculated evaluation
scores for each of the reduced dimension representations, while the plots
element is a list of diagnostic plots.
The plotRankSummary()
function can be used to aggregate the information across
all evaluation metrics. Each reduced dimension representation will be assigned a
rank for each metric, and the sum of these ranks across all metrics, as well as
the contribution from each metric, is visualized by the function. Metrics aimed
at evaluating the preservation of global structure are colored blue, while those
aimed at evaluating the preservation of local structure are colored red.
plotRankSummary(dre$scores)
- A python framework for reduced dimension representation evaluation was proposed by Heiser and Lau (2019). Code is available on GitHub. This study proposed to use the Mantel correlation between distance matrices, the Earth Mover's Distance between distance distributions, and the percentage of total binary elements of the KNN matrix that are conserved as evaluation metrics.
- The dimRed R package implements a collection of quality scores for reduced dimension representations, including Q_local, Q_global (based on the co-ranking matrix) and various correlation measures.
- The quadra R package implements quality scores for reduced dimension representations based on preservation of neighborhoods and agreement with known labels.