Dim reductoin on big dataset
MislavSag opened this issue · 6 comments
Great package.
Is the package suitable for very big datasets? I am talking about the datasets of dimension (1.000.000x300)?
I have just tried this code:
mod1<-constructModel(data_sample,p=4,"Basic",gran=c(150,10),RVAR=FALSE,h=1,cv="Rolling",MN=FALSE,verbose=FALSE,IC=TRUE)
results=cv.BigVAR(mod1)
and it is pretty slow with just (1000x100) X matrix (cca 10 minutes).
My goal is to do dimension reduction, but not sure if your package is appropriate for this.
Time series with those dimensions (large T, small k) should be feasible in this framework, but rolling validation for penalty parameter selection is not advisable since the process will be very computationally intensive. I would instead suggest something like n-fold cross validation as described in section 3.2 http://www.wbnicholson.com/BigVAR.html.
One was to potentially improve performance is to ensure that the BLAS/OpenMP are single-threaded. You can do so by adding the following code to your .Rprofile:
`
library(RhpcBLASctl)
blas_set_num_threads(1)
omp_set_num_threads(1)
`
I have returned to your answer after some time :)
I have just tried to implement CV from this tutorial: http://www.wbnicholson.com/BigVAR.html#n-fold-cross-validation
CV part is in 3.2.
When I execute the NFoldcv function it returns and error:
Error in 2:nrow(Z1) : argument of length 0
The problem is that is a list of two elements: Y and Z. So instead of Z1 there should be Z1$Z
or Z1$Y
? in line trainZ <- Z1[2:nrow(Z1),]
.
Yes, it should be Z1$Z, I will make the correction.
Thanks. I think this can be closed now.
It seems this is not solved?
This has been fixed now.