UNDER DEVELOPMENT
Author: Damien Delforge
Email: damien.delforge@uclouvain.be
- Compatibility with python 3
- Computing the weighted correlation matrix is too long for large time series
- Reconstructing series is too long for long time series
- Make 'paired' plot able to plot any pair, not just consecutive one
- Implement Toeplitz SSA
- Add a decimal in the characteristic contribution
- Add a method for change point analysis that uses Singular Spectrum Transansformation (SST)
Singular Spectrum Analysis (SSA) is a non-parametric and model free method for time series decomposition, reconstruction (and foracasting). The general walktrhough of SSA consists in (1) embedding the time series into a trajectory matrix of lagged vectors, (2) decomposing the trajectory matrix using singular value decomposition (SVD), (3) grouping the resulting components based on similarities between their singular values or eigenvectors to reconstruct interpretable components of the original time series. The later is usually supervised.
The main hypothesis behind SSA is separability of the components.
- Smoothing, filtering, noise reduction
- Structured components extraction (ie. trend or seasonality)
- Filling missing values
- Forecasting
- Nonlinear time series analysis
Different variants of SSA could be declined based either on the embedding method, the decomposition method or the dimensional nature of the dataset of interest.
- Basic SSA: The basic 1d SSA algorithm also known as the Broomhead-King variant of SSA (or BK-SSA). The time series is embedded in a Hankel matrix.
- Toeplitz SSA: The Toeplitz variant of SSA also known as Vautard-Ghil variant of SSA (or VG-SSA). The time series is embedded in a Toeplitz matrix. Toeplitz SSA should be used when time series is known to be stationary.
- SSA-ICA: ICA refers to Independent Component Analysis (ICA) and replace SVD. This variants helps to separate components in case of weak separability. As a less stable procedure than SVD, SSA-ICA is best used in a two stages procedure, a first separation is done using a basic SVD method, then remaining mixed-up components are decomposed using SSA-ICA.
Some other variants are not 1-dimensional such as: Multichannel SSA (multiple time series), 2d-SSA (arrays and images), nd-SSA (nd arrays).
TODOC
see: https://stats.stackexchange.com/a/159602/87558
see: https://cran.r-project.org/web/packages/Rssa/Rssa.pdf
rSSA package uses either 'nutrlan', 'propack', 'full svd' with the dgesdd routine and 'eigen' as full SVD via eigendecompsition of the cross-product matrix
see: https://code.lbl.gov/pipermail/trlan-users/2009-May/000007.html
Here is a little longer answer to your question on comparing ARPACK with TRLan. TRLan (and nuTRLan) implements a restarted version of Lanczos algorithm, just like ARPACK implements a restarted version of Arnoldi algorithm. In this regard, the user has control over the memory usage by controlling the maximum basis size. Another similarity is that both can keep an arbitrary number of basis vectors when restarting -- this is the key advantage of these methods over earlier restarted versions. On symmetric (or Hermitian) problems, when the basis vectors corresponding to the same Ritz values are saved during restarting, TRLan and ARPACK are theoretically equivalent. One difference is that TRLan uses Ritz vectors while ARPACK uses the vectors produced by the implicit QR procedure. This makes TRLan a little easier to understand and implement. This difference is mainly useful for software implementors -- it is of no consequence to the end users. What do have some consequence are the following. TRLan can take advantage of the symmetry in the original problem as Ichi has pointed out. TRLan and especially nuTRLan use more advanced strategies to decide what Ritz values to save during restarting. These strategies have been demonstrated to be very effective. In general, the restarted version of Lanczos would need more matrix-vector multiplications than the un-restarted version. In cases where the un-restarted Lanczos can be used, TRLan was shown to use nearly the same number of matrix-vector multiplications. On more difficult eigenvalue problems, TRLan usually performed better because of the new restarting strategies.
- Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 http://arxiv.org/abs/arXiv:0909.4061
- A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert
- An implementation of a randomized algorithm for principal component analysis A. Szlam et al. 2014
The decomposition is performed using LAPACK, with option ´full_matrices´,
The decompotion is performed usin LAPACK, with option ´full_matrices´, additionnaly the user can choose a lapack_driver.
Compute the largest k singular values/vectors for a sparse matrix. This is a naive implementation using ARPACK as an eigensolver on A.H * A or A * A.H, depending on which one is more efficient.
This estimator supports two algorithms: a fast randomized SVD solver, and a “naive” algorithm that uses ARPACK as an eigensolver on (X * X.T) or (X.T * X), whichever is more efficient.
Both nplapack
and splapack
use the LAPACK algorithm for full svd decomposition but the scipy implementation allows more flexibility. Hence it is kept. The same goes with sparpack
and skarpack
, as skarpack
is just a wrapper to sparpack
with fewer arguments allowed. To my knowledge, skrandom
has no equivalent.
Are kept: splapack
,sparpack
and skrandom
.
TODOC
[1] Singular Spectrum Analysis for Time Series | Nina Golyandina | Springer. Accessed November 19, 2017. //www.springer.com/gp/book/9783642349126.