/bigtime

Sparse estimation of large time series models

Primary LanguageHTML

bigtime: Sparse Estimation of Large Time Series Models

CRAN_Version_Badge CRAN_Downloads_Badge License_GPLv2_Badge License_GPLv3_Badge

The goal of bigtime is to sparsely estimate large time series models such as the Vector AutoRegressive (VAR) Model, the Vector AutoRegressive with Exogenous Variables (VARX) Model, and the Vector AutoRegressive Moving Average (VARMA) Model. The univariate cases are also supported.

Installation

The package can be installed from CRAN using

install.packages("bigtime")

You can install the development version (= the latest version) of bigtime from github as follows:

# install.packages("devtools")
devtools::install_github("ineswilms/bigtime")

Plotting the data

bigtime comes with example data sets created from a VAR, VARMA and VARX DGP. These data sets are called var.example, varma.example, and varx.example respectively and can be loaded into the environment by calling data(var.example) and similarly for the others. We can have a look at the varx.example data set by first loading it into the environment and then, using a little utility function, plotting it.

library(bigtime)
suppressMessages(library(tidyverse)) # Will be used for nicer visualisations
#> Warning: package 'tidyverse' was built under R version 4.0.5
#> Warning: package 'ggplot2' was built under R version 4.0.3
#> Warning: package 'readr' was built under R version 4.0.4
#> Warning: package 'purrr' was built under R version 4.0.3
#> Warning: package 'dplyr' was built under R version 4.0.5
#> Warning: package 'stringr' was built under R version 4.0.3
#> Warning: package 'forcats' was built under R version 4.0.4
data(varx.example) # loading the varx example data



plot_series <- function(Y){
  as_tibble(Y) %>%
  mutate(Time = 1:n()) %>%
  pivot_longer(-Time, names_to = "Series", values_to = "vals") %>%
  mutate(Series = factor(Series, levels = colnames(Y))) %>%
  ggplot() +
  geom_line(aes(Time, vals)) + 
  facet_wrap(facets = vars(Series), ncol = 1) + 
  ylab("") +
  theme_bw()
}

plot_series(Y.varx)

plot_series(X.varx)

Multivariate Time Series Models

Vector AutoRegressive (VAR) Models

Simulation

While we could use the example data provided, bigtime also supports simulation of VAR models using both lasso and elementwise type sparsity patterns. Since lasso is the most widely known one, we will start off with lasso. To simulate a VAR model having lasso type of sparsity, use the simVAR function and set sparsity_pattern="lasso". The lasso sparsity pattern has the additional num_zero option which determines the number of zeros in the VAR coefficient matrix (excluding intercepts). Note: we also set a seed so that the simulation is replicable. We can then use the summary function to obtain a summary of the simulated data.

periods <- 200 # time series length
k <- 5 # number of time series
p <- 8 # maximum lag 
sparsity_options <- list(num_zero = 15) # 15 zeros across the k=5 VAR coefficient matrices
sim_data <- simVAR(periods = periods, k = k, p = p, 
                   sparsity_pattern = "lasso", 
                   sparsity_options = sparsity_options,
                   seed = 123456, 
                   decay = 0.01) # the smaller decay the larger early coefs w.r.t. to later ones
Y.lasso <- sim_data$Y
summary(sim_data) # obtaining a summary of the simulated time series data
#> #### General Information #### 
#> 
#> Seed                                        123456 
#> Time series length                          200 
#> Burnin                                      200 
#> Variables Simulated                         5 
#> Number of Lags                              8 
#> Coefficients were randomly created?         TRUE 
#> Maximum Eigenvalue of Companion Matrix      0.8 
#> Sparsity Pattern                            lasso 
#> 
#> 
#> #### Sparsity Options #### 
#> 
#> $num_zero
#> [1] 15
#> 
#> 
#> 
#> #### Coefficient Matrix #### 
#> 
#>                  Y1            Y2            Y3            Y4            Y5
#> Y1.L1  3.344178e-01 -1.107252e-01 -1.423944e-01  3.509198e-02  9.033998e-01
#> Y2.L1  3.347094e-01  5.264215e-01  1.003833e+00  4.685881e-01 -1.709388e-01
#> Y3.L1 -3.995565e-01 -4.468152e-01 -2.235442e-02  4.710753e-01  4.224553e-01
#> Y4.L1  2.310627e-02 -2.948322e-01  3.732432e-01  6.691342e-01  2.244958e-01
#> Y5.L1  0.000000e+00  5.040150e-01  1.543970e-02  7.602611e-02  1.855509e-01
#> Y1.L2 -1.714191e-03  6.652793e-05  2.827333e-03  3.898174e-03 -2.488847e-03
#> Y2.L2 -3.432958e-03  2.790046e-04 -4.196394e-03 -1.102596e-02 -4.531967e-03
#> Y3.L2 -3.456294e-03  6.257595e-03  4.071610e-03  4.187554e-03 -4.475995e-03
#> Y4.L2  0.000000e+00  3.882016e-03  6.860267e-04 -3.594939e-03  6.349122e-04
#> Y5.L2 -2.013360e-03  0.000000e+00 -4.561977e-04  4.355841e-03 -4.860029e-03
#> Y1.L3 -7.090490e-05 -1.972221e-05  1.289428e-05  0.000000e+00  0.000000e+00
#> Y2.L3 -1.362040e-05 -4.321743e-05 -5.979591e-05 -1.013789e-05 -4.890420e-06
#> Y3.L3 -2.603130e-05  1.255775e-05  4.926041e-06 -3.356641e-05  2.408345e-05
#> Y4.L3 -9.864658e-06 -7.407063e-06  9.288386e-07 -1.943981e-05 -2.959808e-05
#> Y5.L3  5.224473e-05  2.264257e-05 -7.240193e-05  1.758216e-05 -5.780336e-05
#> Y1.L4  3.821882e-07 -2.899947e-07  1.955819e-08 -6.271466e-07 -9.234996e-07
#> Y2.L4  4.644695e-07 -2.826756e-07 -6.312740e-07  2.079154e-07 -4.271531e-07
#> Y3.L4  1.887393e-08  3.401594e-07  1.735509e-07  2.097019e-07  0.000000e+00
#> Y4.L4 -1.992917e-07  5.054378e-07  2.266186e-07 -1.382372e-07  2.907276e-07
#> Y5.L4  3.465950e-07  1.481081e-07  6.351945e-07  2.421511e-08  5.162710e-08
#> Y1.L5 -3.412591e-09  4.406386e-09 -4.838797e-09  2.333901e-09  3.464337e-09
#> Y2.L5  3.943109e-09 -5.099138e-09 -4.262808e-12 -3.854296e-09  5.050800e-09
#> Y3.L5 -3.476037e-09  9.989383e-10 -3.456284e-10 -1.977003e-09 -1.078979e-09
#> Y4.L5 -1.601388e-09 -3.787661e-09  4.063988e-09 -1.639348e-09 -7.263552e-10
#> Y5.L5  4.748150e-09 -6.451814e-09 -9.114958e-09 -8.185059e-09 -2.599211e-09
#> Y1.L6  2.878848e-11 -4.147466e-12  0.000000e+00 -3.921230e-11  0.000000e+00
#> Y2.L6  1.757755e-12 -4.097641e-11  3.182509e-11  2.012060e-11 -6.235119e-11
#> Y3.L6  3.543788e-11 -8.333751e-12  7.345804e-11  7.110232e-12 -3.232298e-11
#> Y4.L6  7.813330e-11  4.861321e-12 -1.423732e-11  4.223140e-11  4.665353e-11
#> Y5.L6 -3.204189e-11  3.110420e-11 -8.002152e-11 -1.406136e-11  1.527682e-11
#> Y1.L7  1.035718e-13  3.868736e-13  0.000000e+00  3.274603e-14 -6.488767e-14
#> Y2.L7  5.105382e-13 -2.889064e-13  6.181295e-13  3.424112e-13  3.218496e-13
#> Y3.L7 -1.374288e-13  4.027135e-14  6.096383e-14  0.000000e+00 -1.371909e-13
#> Y4.L7 -5.274508e-13 -7.661388e-13  2.324513e-13 -6.879730e-13  0.000000e+00
#> Y5.L7 -6.332500e-13  7.399670e-13  3.913858e-13 -1.324464e-13  4.786837e-13
#> Y1.L8 -1.337775e-15  6.673161e-16  0.000000e+00  7.885634e-15  2.243677e-15
#> Y2.L8 -2.573135e-15 -1.884193e-16  4.549152e-15  1.160132e-15  1.854712e-15
#> Y3.L8  4.740580e-15  0.000000e+00 -2.501340e-15 -2.635646e-15  2.839144e-15
#> Y4.L8  0.000000e+00 -1.797436e-15  4.488466e-15  7.460492e-16 -1.452254e-15
#> Y5.L8  9.907469e-16  1.438249e-15  6.791980e-17 -9.437176e-16  0.000000e+00

A VAR with HLag (elementwise) type of sparsity can be simulated using simVAR and by setting sparsity_pattern="HLag". Three extra options exist: (1) zero_min determines the minimum number of zero coefficients of each variable in each equation, (2) zero_max determines the maximum number of zero coefficients of each variable in each equation, and (3) zeroes_in_self is a boolean option that if TRUE, zero coefficients of the ith variable can exist in the ith equation.

periods <- 200
k <- 5 # Number of time series
p <- 5 # Maximum lag 
sparsity_options <- list(zero_min = 0, 
                         zero_max = 5, 
                         zeroes_in_self = TRUE)
sim_data <- simVAR(periods = periods, k = k, p = p, 
                   sparsity_pattern = "HLag", 
                   sparsity_options = sparsity_options,
                   seed = 123456, 
                   decay = 0.01)
Y.hlag <- sim_data$Y
summary(sim_data) # Obtaining a summary of the simulation
#> #### General Information #### 
#> 
#> Seed                                        123456 
#> Time series length                          200 
#> Burnin                                      200 
#> Variables Simulated                         5 
#> Number of Lags                              5 
#> Coefficients were randomly created?         TRUE 
#> Maximum Eigenvalue of Companion Matrix      0.8 
#> Sparsity Pattern                            hvar 
#> 
#> 
#> #### Sparsity Options #### 
#> 
#> $zero_min
#> [1] 0
#> 
#> $zero_max
#> [1] 5
#> 
#> $zeroes_in_self
#> [1] TRUE
#> 
#> 
#> 
#> #### Coefficient Matrix #### 
#> 
#>                  Y1            Y2            Y3            Y4            Y5
#> Y1.L1  0.000000e+00 -1.032031e-01 -1.327209e-01  3.270802e-02  8.420276e-01
#> Y2.L1  3.119710e-01  4.906592e-01  9.356382e-01  4.367547e-01 -1.593261e-01
#> Y3.L1 -3.724128e-01 -4.164610e-01 -2.083578e-02  4.390729e-01  3.937560e-01
#> Y4.L1  2.153655e-02 -2.748029e-01  3.478871e-01  0.000000e+00  2.092447e-01
#> Y5.L1 -2.818808e-01  4.697750e-01  1.439081e-02  7.086130e-02  1.729456e-01
#> Y1.L2  0.000000e+00  0.000000e+00  2.635259e-03  3.633353e-03 -2.319768e-03
#> Y2.L2 -3.199742e-03  2.600505e-04  0.000000e+00 -1.027691e-02 -4.224090e-03
#> Y3.L2 -3.221492e-03  5.832487e-03  0.000000e+00  3.903074e-03 -4.171920e-03
#> Y4.L2  0.000000e+00  3.618292e-03  6.394217e-04  0.000000e+00  5.917797e-04
#> Y5.L2 -1.876583e-03 -3.611195e-03 -4.252061e-04  4.059929e-03 -4.529864e-03
#> Y1.L3  0.000000e+00  0.000000e+00  1.201831e-05  0.000000e+00  5.747130e-05
#> Y2.L3 -1.269510e-05 -4.028146e-05  0.000000e+00 -9.449180e-06 -4.558191e-06
#> Y3.L3 -2.426287e-05  1.170464e-05  0.000000e+00 -3.128609e-05  2.244735e-05
#> Y4.L3  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00 -2.758734e-05
#> Y5.L3  0.000000e+00  2.110436e-05 -6.748333e-05  1.638772e-05 -5.387651e-05
#> Y1.L4  0.000000e+00  0.000000e+00  1.822951e-08  0.000000e+00 -8.607620e-07
#> Y2.L4  4.329159e-07 -2.634722e-07  0.000000e+00  0.000000e+00 -3.981346e-07
#> Y3.L4  0.000000e+00  3.170507e-07  0.000000e+00  1.954558e-07 -9.491777e-08
#> Y4.L4  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
#> Y5.L4  0.000000e+00  1.380464e-07  5.920428e-07  0.000000e+00  4.811983e-08
#> Y1.L5  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  3.228988e-09
#> Y2.L5  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
#> Y3.L5  0.000000e+00  0.000000e+00  0.000000e+00 -1.842696e-09 -1.005678e-09
#> Y4.L5  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
#> Y5.L5  0.000000e+00 -6.013512e-09 -8.495736e-09  0.000000e+00  0.000000e+00

Lasso estimation

The above simulated data (and truly any other data) could then be estimated using an L1-penalty (lasso penalty) on the autoregressive coefficients. To this end, set the VARpen argument in the sparseVAR function equal to L1. Note: it is recommended to standardise the data. Bigtime will give a warning if the data is not standardised but will not stop you from continuing. Setting selection="none", the default, allows us to specify the penalization we want. Furthermore, we can predefine the maximum lag-order by changing the p argument to the desired value. However, we do not recommend this, as the bigtime will, by default, choose a maximum lag-order that is well suited in many scenarios.

VAR.L1 <- sparseVAR(Y = scale(Y.lasso), # standardising the data
                    VARpen = "L1", # using lasso penalty
                    VARlseq = 1.5) # Specifying the penalty to be used. selection="none" is the default.

Tuning parameter selection

For the selection of the penalization parameter, we can either set the selection argument to "none", which would return a model for a sequence of penalizations, or use time series cross-validation by setting selection="cv", or finally we could also use any of BIC, AIC, or HQ information criteria by setting the selection arguments to any of "bic", "aic", or "hq" respectively. We will start of by using time series cross-validation and will therefore set selection="cv". The default is to use a cross-validation score based on one-step ahead predictions but you can change the default forecast horizon under the argument h.

VAR.L1 <- sparseVAR(Y=scale(Y.lasso),  # standardising the data
                   selection = "cv", # using time series cross-validation
                   VARpen = "L1") # using the lasso penalty

The plot_cv function can be used to investigate the cross-validation procedure. The returned plot shows the mean MSFE for each penalization together with error bars for plus-minus one standard deviation. The black dashed line indicates the penalty parameter choice that lead to the smallest MSFE in the CV procedure. The red dotted line, on the other hand, shows the one-standard-error solution. It picks the most parsimonious model within one standard error of the best cross-validation score. The latter is the one that is chosen by default in sparseVAR.

plot_cv(VAR.L1)

Further investigation into the model can be done by using the function lagmatrix, which returns the lagmatrix of the estimated autoregressive coefficients. If entry (i, j) = x, this means that the sparse estimator indicates the effect of time series j on time series i to last for x periods. Setting the returnplot argument to TRUE will return a heatmap for better visual inspection.

LhatL1 <- lagmatrix(fit=VAR.L1, returnplot=TRUE)

The lag matrix is typically sparse as it contains some empty (i.e., zero) cells. However, VAR models estimated with a standard L1-penalty are typically not easily interpretable as they select many high lag order coefficients (i.e., large values in the lagmatrix).

To circumvent this problem, we advise using a lag-based hierarchically sparse estimation procedure, which boils down to using the default option HLag for the VARpen argument. This estimation procedure encourages low maximum lag orders, often results in sparser lagmatrices, and hence more interpretable models.

Hlag estimation

Models can be estimated using the hierarchical penalization by using the default argument to VARpen, namely HLag. Model selection can again be done by either setting selection="none" and obtaining a whole sequence of models, or by using any of cv, bic, aic, hq.

VARHLag_none <- sparseVAR(Y=scale(Y.hlag), 
                          selection = "none") # HLag is the default VARpen option
VARHLag_cv <- sparseVAR(Y=scale(Y.hlag), 
                        selection = "cv")
VARHLag_bic <- sparseVAR(Y=scale(Y.hlag), 
                         selection = "bic") # This will also give a table for IC comparison showing the selected lambda for each IC
#> 
#> 
#> #### Selected the following lambda ####
#> 
#>      AIC      BIC       HQ 
#> 10.74023 10.74023 10.74023 
#> 
#> 
#> #### Details ####
#> 
#>      lambda         AIC         BIC          HQ
#> 1  138.7154     -0.9793     -0.9793     -0.9793
#> 2   83.1577     -1.7792     -1.6473     -1.7258
#> 3   49.8517     -3.0055     -2.7746      -2.912
#> 4   29.8853     -4.1453     -3.8155     -4.0118
#> 5   17.9158      -4.826     -4.4631     -4.6791
#> 6   10.7402 ==> -5.0769 ==> -4.6151   ==> -4.89
#> 7    6.4386     -5.0698     -4.3277     -4.7695
#> 8    3.8598     -4.6703     -3.0377     -4.0096
#> 9    2.3139     -2.7531      2.5737     -0.5974
#> 10   1.3872     -1.5512      6.5956      1.7457

Diagnostics

Depending on which selection procedure was used, various diagnostics can be produced. Former and foremost, all selection procedures support the fitted and residuals functions to obtain the fitted and residual values respectively. Both functions return a 3D array if the model used selection="none" corresponding to the fitted/residual values for each model in the penalization sequence.

res_VARHLag_none <- residuals(VARHLag_none) # This is a 3D array
dim(res_VARHLag_none)
#> [1] 179   5  10

When an actual selection method was used (cv, bic, aic, hq), then other diagnostic functions exist. For cv, plot_cv could be used again, just as shown above. For all, the diagnostics_plot function can be used to obtain a plot of fitted and residual values.

p_bic <- diagnostics_plot(VARHLag_bic, variable = "Y3") # variable argument can be numeric or character
p_cv <- diagnostics_plot(VARHLag_cv, variable = "Y3") # variable argument can be numeric or character

plot(p_bic)

plot(p_cv)

Vector AutoRegressive with Exogenous Variables (VARX) Models

Often practitioners are interested in incorporating the impact of unmodeled exogenous variables (X) into the VAR model. To do this, you can use the sparseVARX function which has an argument X where you can enter the data matrix of exogenous time series. For demonstration purposes, we will use the varx.example data set that is part of bigtime.

data(varx.example)

When applying the lagmatrix function to an estimated sparse VARX model, the lag matrices of both the endogenous and exogenous autoregressive coefficients are returned.

sparseVARX supports the same selection arguments as sparseVAR: none, cv, bic, aic, hq, and it is again recommended to standardise the data.

VARXfit_cv <- sparseVARX(Y=scale(Y.varx), X=scale(X.varx), selection = "cv")
LhatVARX <- lagmatrix(fit=VARXfit_cv, returnplot=TRUE)

VARX models also support plot_cv if estimated using CV. The returned plot shows the mean MSFE for each combination of penalizations in a heatmap. The x-axis show the penalizations for the exogenous variables, and the y-axis shows the penalizations for the endogenous variables. The big black dot in the plot below indicates the one-SE optimal choice, while the contours indicate the mean MSFE in the CV procedure. The red colour indicates a high MSFE, and light-yellow to yellow regions indicate low MSFEs.

plot_cv(VARXfit_cv)

If selection="none" a 3D array will be returned again. Although not mentioned previously, when setting selection to none, or any of the IC, one can also easily provide a penalization sequence, or even just ask for a single penalization setting. For example, below we intentionally choose a single, small penalization for the exogenous variables.

VARXfit_none <- sparseVARX(Y=scale(Y.varx), X=scale(X.varx), VARXlBseq = 0.001, selection = "none")
dim(VARXfit_none$Phihat) # This is a 3D array
#> [1]  3 63 10
# This is also 3D but third dimension is equal to ten 
# --> one penalization was chosen for B and 10 automatically for Phi
# --> Cross product makes 10
dim(VARXfit_none$Bhat) 
#> [1]  3 63 10

Other functions such as residuals, fitted, and diagnostics_plot are also supported.

Vector AutoRegressive Moving Average (VARMA) Models

VARMA models generalise VAR models and often allow for more parsimonious representations of the data generating process. To estimate a VARMA model to a multivariate time series data set, use the function sparseVARMA, and choose a desired selection method. The sparse VARMA estimation procedures consists of two stages: in the first stage a VAR model is estimated from which the residuals are retrieved. In the second stage these residuals are used as approximated error terms to estimate the VARMA model. As a default, sparseVARMA uses CV in the first stage and none in the second stage. The first stage does not support none: A selection needs to be made.

Now lag matrices are obtained for the autoregressive (AR) coefficients and the moving average (MAs) coefficients.

data(varma.example)
VARMAfit <- sparseVARMA(Y=scale(Y.varma), VARMAselection = "cv") # VARselection="cv" as default.  
LhatVARMA <- lagmatrix(fit=VARMAfit, returnplot=T)

Other functions such as plot_cv, residuals, fitted and diagnosticsplot are also supported.

Evaluating Forecast Performance

To obtain forecasts from the estimated models, you can use the directforecast function for VAR, VARMA, and VARX, or the recursiveforecast function for VAR models. The default forecast horizon (argument h) is set to one such that one-step ahead forecasts are obtained, but you can specify your desired forecast horizon.

Finally, we compare the forecast accuracy of the different models by comparing their forecasts to the actual time series values of the var.example data set that comes with bigtime. We will estimate all models using CV.

In this example, the VARMA model has the best forecast performance (i.e., lowest mean squared prediction error). This is somewhat surprising given the data comes from a VAR model.

data(var.example)
dim(Y.var)
#> [1] 200   5
Y <- Y.var[-nrow(Y.var), ] # leaving the last observation for comparison
Ytest <- Y.var[nrow(Y.var), ]

VARcv <- sparseVAR(Y = scale(Y), selection = "cv")
VARMAcv <- sparseVARMA(Y = scale(Y), VARMAselection = "cv")

Y <- Y.var[-nrow(Y.var), 1:3] # considering first three variables as endogenous
X <- Y.var[-nrow(Y.var), 4:5] # and last two as exogenous
VARXcv <- sparseVARX(Y = scale(Y), X = scale(X), selection = "cv")

VARf <- directforecast(VARcv) # default is h=1
VARXf <- directforecast(VARXcv)
VARMAf <- directforecast(VARMAcv)

# We can only compare forecasts for the first three variables
# because VARX models only forecast endogenously modelled variables
mean((VARf[1:3]-Ytest[1:3])^2)
#> [1] 0.6252039
mean((VARXf[1:3]-Ytest[1:3])^2)
#> [1] 0.6319843
mean((VARMAf[1:3]-Ytest[1:3])^2) # lowest=best
#> [1] 0.4571325

Note that we could easily obtain longer horizon forecasts for the VAR model by using the recursiveforecast function. It is recommended to call is.stable first though, to check whether the obtained VAR model is stable.

is.stable(VARcv)
#> [1] TRUE
rec_fcst <- recursiveforecast(VARcv, h = 10)
plot(rec_fcst, series = "Y2", last_n = 50) # Plotting of a recursive forecast

Univariate Models

The functions sparseVAR, sparseVARX, sparseVARMA can also be used for the univariate setting where the response time series Y is univariate. Below we illustrate the usefulness of the sparse estimation procedure as automatic lag selection procedures.

AutoRegressive (AR) Models

We start by generating a time series of length n = 50 from a stationary AR model and by plotting it. The sparseVAR function can also be used in the univariate case as it allows the argument Y to be a vector.

The lagmatrix function gives the selected autoregressive order of the sparse AR model. The true order is one.

periods <- 50
k <- 1
p <- 1
sim_data <- simVAR(periods, k, p, 
                   sparsity_pattern = "none", 
                   max_abs_eigval = 0.5, 
                   seed = 123456)
summary(sim_data)
#> #### General Information #### 
#> 
#> Seed                                        123456 
#> Time series length                          50 
#> Burnin                                      50 
#> Variables Simulated                         1 
#> Number of Lags                              1 
#> Coefficients were randomly created?         TRUE 
#> Maximum Eigenvalue of Companion Matrix      0.5 
#> Sparsity Pattern                            none 
#> 
#> 
#> #### Sparsity Options #### 
#> 
#> NULL
#> 
#> 
#> #### Coefficient Matrix #### 
#> 
#>           [,1]
#> [1,] 0.4998982

y <- scale(sim_data$Y)
ARfit <- sparseVAR(Y=y, selection = "cv")
lagmatrix(ARfit)
#> $LPhi
#>      [,1]
#> [1,]    1

Note that all diagnostics functions discussed for the VAR, VARMA, VARX cases also work for univariate cases; so do the forecasting functions.

AutoRegressive with Exogenous Variables (ARX) Models

We start by generating a time series of length n = 50 from a stationary ARX model and by plotting it. The sparseVARX function can also be used in the univariate case as it allows the arguments Y and X to be vectors. The lagmatrix function gives the selected endogenous (under LPhi) and exogenous autoregressive (under LB) orders of the sparse ARX model. The true orders are one.

periods <- 50
k <- 1
p <- 1
burnin <- 100
Xsim <- simVAR(periods+burnin+1, k, p, max_abs_eigval = 0.8, seed = 123)
edist <- lag(Xsim$Y)[-1, ] + rnorm(periods + burnin, sd = 0.1)
Ysim <- simVAR(periods, k , p, max_abs_eigval = 0.5, seed = 789, 
               e_dist = t(edist), burnin = burnin)
plot(Ysim)

x <- scale(Xsim$Y[-(1:(burnin+1))])
y <- scale(Ysim$Y)

ARXfit <- sparseVARX(Y=y, X=x, selection = "cv")
lagmatrix(fit=ARXfit)
#> $LPhi
#>      [,1]
#> [1,]    1
#> 
#> $LB
#>      [,1]
#> [1,]    2

AutoRegressive Moving Average (ARMA) Models

We start by generating a time series of length n = 50 from a stationary ARMA model and by plotting it. The sparseVARMA function can also be used in the univariate case as it allows the argument Y to be a vector. The lagmatrix function gives the selected autoregressive (under LPhi) and moving average (under LTheta) orders of the sparse ARMA model. The true orders are one.

periods <- 50
k <- 1
p.u <- 1
p.y <- 1
burnin <- 100

u <- rnorm(periods + 1 + burnin, sd = 0.1)
u <- embed(u, dimension = p.u + 1)
u <- u * matrix(rep(c(1, 0.2), nrow(u)), nrow =  nrow(u), byrow = TRUE) # Second column is lagged, first is current error
edist <- rowSums(u)

Ysim <- simVAR(periods, k, p, e_dist = t(edist), 
               max_abs_eigval = 0.5, seed = 789, 
               burnin = burnin)
summary(Ysim)
#> #### General Information #### 
#> 
#> Seed                                        789 
#> Time series length                          50 
#> Burnin                                      100 
#> Variables Simulated                         1 
#> Number of Lags                              1 
#> Coefficients were randomly created?         TRUE 
#> Maximum Eigenvalue of Companion Matrix      0.5 
#> Sparsity Pattern                            none 
#> 
#> 
#> #### Sparsity Options #### 
#> 
#> NULL
#> 
#> 
#> #### Coefficient Matrix #### 
#> 
#>           [,1]
#> [1,] 0.4989384

ARMAfit <- sparseVARMA(Y=y, VARMAselection = "cv")
lagmatrix(fit=ARMAfit)
#> $LPhi
#>      [,1]
#> [1,]    3
#> 
#> $LTheta
#>      [,1]
#> [1,]    0

Additional Resources

For a non-technical introduction to VAR models see this interactive notebook and for an interactive notebook demonstrating the use of bigtime for high-dimensional VARs, see this notebook. Note: Loading these notebooks sometimes can take quite some time. Please be patient or try another time.

References:

  • Nicholson William B., Wilms Ines, Bien Jacob and Matteson David S. (2020), “High-dimensional forecasting via interpretable vector autoregression”, Journal of Machine Learning Research, 21(166), 1-52.

  • Wilms Ines, Basu Sumanta, Bien Jacob and Matteson David S. (2021), “Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages”, Journal of the American Statistical Association, doi: 10.1080/01621459.2021.1942013.

  • Wilms Ines, Basu Sumanta, Bien Jacob and Matteson David S. (2017), “Interpretable Vector AutoRegressions with Exogenous Time Series”, NIPS 2017 Symposium on Interpretable Machine Learning, arXiv:1711.03623