pavpop is an R package for fitting phase and amplitude varying population pattern (pavpop) models on curve data.
If you have a set of curves that all represent the same underlying object, but are not aligned, pavpop is your friend. But it can go further than other curve regirstation methods. It can register curves that contain systematic differences in shape. Consider for example the 12 repetitions of acceleration signals arising from a from a pen writing.
Using pavpop on the data will align them nicely, the result is shown in the following figure. But other methods can produce give similar results? ---------------------------------------------------True, but pavpop is more general than other available methods. It is based on maximum likelihood estimation, and given data it finds the most likely separation of the amplitude (vertical) and phase (horizontal) variation in data. Thus, in addition to the regularization imposed by the distributional assumptions, the joint model for amplitude and phase variation will be implicitly regularized by means of a likelihood tradeoff. Warping functions that produce implausible amplitude variation (e.g. by overfitting in few points) will be deemed unlikely, and thus be avoided. As a result, pavpop can produce plausible warping functions when there are large systematic differences between samples---even in cases where the naked eye may not recognize a good alignment. Consider the following simulated data set consisting of 10 curves each observed at 100 points. The curves follow the same mean function (dashed), but the mean has been warped and amplitude variation have been added.
So, let us try some different methods for aligning.L2 warping | Elastic curve alignment | pavpop |
---|---|---|
L2 warping simply estimates the warping function (two-anchor point increasing spline) that gives the smallest squared residual. Elastic curve alignment uses the celebrated quare-root velocity function framwork, in particular through the align_fPCA
function in the fdasrvf
R package. And pavpop, well, use a pavpop model.
Looking at the above figures, none of the alignments give overly clear results, although the elastic curve alignment generally seem to align the features better. Also, all the estimated mean curves (yellow dashed), seem to model the shape of the true mean curve (black dashed) quite well. But since the data is simulated, we can verify which method does the best. In the figures below, the estimated/predicted warping functions are plotted against the true warping functions.
L2 warping | Elastic curve alignment | pavpop |
---|---|---|
If the warping is good, all the curves should cluster closely, ideally along a line with slope 1 (little deformation of the mean). Here we see that both L2 warping and Elastic curve alignment have systematic deviations in the estimated warping functions, while pavpop predicts the warping functions neatly. The take home message: a good alignment is often not the one that gives the best visual impression.
-
As opposed to most other registration methods, pavpop treats phase and amplitude variation equally and fits the model with both effects simultaneously. Thus, the bias related to conventional preprocessing or two-step tehcniques is eliminated.
-
pavpop estimates all variance(/hyper/regularizaton) parameters from data using maximum likelihood estimation.
-
Since pavpop does likelihood-based inference, conventional model checking tools can be used and a large array of classical statistical procedures can be imposed for further analyses.
You can install the latest version of pavpop by running the following code in R.
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
devtools::install_github('larslau/pavpop')
L.L. Raket, S. Sommer, and B. Markussen, “A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data,” Pattern Recognition Letters, vol. 38, pp. 1-7, 2014. DOI: 10.1016/j.patrec.2013.10.018
Version 0.10 marks a major update of the package. Most importantly, the documentation has been updated throughout and a large number of new examples and vignettes have been added. The output of the vignettes is now readily available in in pdf format in the vignettes
folder. In addition, the possibilities for specifying amplitude varation (through Gaussian processes, or a functional basis with Gaussian weights) have been unified, and the results can now be handled similarly. The state of the pavpop_clustering
method is questionable in the current version, it will receive a major update in a future release.
The most important new feature is the possibility to use models with full unstructred covariances through the covariance function unstr_cov
. This is particularly useful for modeling warp variables and amplitude variation described in low-dimensional functional bases.
The method for predicting warps can now be controlled using the argument warp_optim_method
in pavpop
. It defaults to conjugate gradient which is markedly slower than before (where Nelder-Mead was used) but does seem to be more robust.
A new option amp_fct
to pavpop
gives the possibility of specifying a functional basis in which to express the amplitude variation. As a special case, this opens up the possibility for fitting sitar-like models. For more information, run the vignettes amplitude_basis.Rmd
and sitar.Rmd
.
The make_cov_fct
function now allows a new argument ns
for creating a flexible class of non-stationary covariance functions from a stationary covariance function by adding one or more new parameters describing the change over time. The parameters will be automatically be estimated using maximum likelihood estimation. This is particularly useful in growth curve analysis, where the scale of for example height variation is known to vary with age.
The pavpop
function now allows the logical argument warped_amp
, which, if set to true, fits the model where the amplitude effects are modeled in warped time. The likelihood function used for estimation does not yet include a linearization term for the random effect, which may give rise to a slight bias in the likelihood estimates.
Parallel computations and clustering are back, and single point samples are now are supported.
The package now supports a variety of warping functions (shifts, linear warps, piecewise linear, smooth) and specification of models with fixed warping effects. Parallel computations and clustering are temporarily suspended, but expected to reemerge in the new structure very soon!