tdaunif: Uniform manifold samplers for topological data analysis
This R package contains sampling functions for topological manifolds embedded (or immersed) in Euclidean space.
motivation
consolidation
Statistical topology and topological data analysis include a variety of methods for detecting topological structure from point cloud data sets, most famously persistent homology. (See Weinberger (2011) for a brief introduction.) These methods are often validated by applying them to point clouds sampled from spaces with known topology. Functions that generate such samples are therefore valuable to developers of topological–statistical software.
Such samplers have not been easy to find in the R package ecosystem. Handfuls exist in several different packages, but no single package or code repository has assembled a large collection for convenient use. That is the goal of this package.
uniformity
The simplest validations use points sampled uniformly from the surfaces
of manifolds embedded in Euclidean space. Uniformity here is with
respect to the surface area of the embedded manifold, but such uniform
sampling is not trivial to do when the manifold can only be expressed
via parameterization from a simpler parameter space, as when a square
is rolled and bent into a
torus. Uniform sampling
on the parameter space may then produce uneven sampling on the manifold,
regions of greater compression being oversampled relative to regions of
greater expansion. One especially important motivation for this package
is therefore to include uniform samplers for popular embeddings of
topological manifolds, including functions that allow users to build
their own with minimal effort (see help(manifold, package = "tdaunif")
).
usage
installation
tdaunif is not yet on CRAN, but you can install it from GitHub using the remotes package:
remotes::install_github("corybrunson/tdaunif")
illustration
An intuitive embedding of the Klein bottle into 4-space is adapted from the popular “donut” embedding of the torus in 3-space. In tdaunif this is called the “tube” parameterization. We can sample uniformly from this embedding (without noise) and examine the coordinate projections as follows:
set.seed(0)
x <- sample_klein_tube(n = 360, ar = 3, sd = 0)
pairs(x, asp = 1, pch = 19, cex = .5)
Compare these neat projections to those of the same sample with noise added in the ambient Euclidean space:
x <- add_noise(x, sd = .2)
pairs(x, asp = 1, pch = 19, cex = .5)
For a thorough discussion of this sampler, see my blog post describing the method presented in Diaconis, Holmes, and Shahshahani (2013).
another illustration
A subset of samplers allow stratified sampling, which relies on an analytic solution to the problem of length or area distortion created by most parameterizations. For example, the planar triangle sampler can be stratified to produce a more uniform arrangement of points than a properly uniform sample would produce:
equilateral <- cbind(c(0,0), c(0.5,sqrt(3)/2), c(1,0))
x <- sample_planar_triangle(n = 720, triangle = equilateral)
y <- sample_planar_triangle(n = 720, triangle = equilateral, bins = 24)
par(mfrow = c(1L, 2L))
plot(x, asp = 1, pch = 19, cex = .5)
plot(y, asp = 1, pch = 19, cex = .5)
par(mfrow = c(1L, 1L))
See Arvo (1995) and Arvo’s notes from Siggraph 2001 for a detailed treatment of this techniquue.