WARNING: ComputeError(): nan predictions found error not computed.
Closed this issue · 11 comments
I have the same issue running rEDM of here: SugiharaLab/pyEDM#23 (comment)
I read the whole issue and tried some solutions, but none of them worked. I have a pipeline constructed on rEDM so changing to pyEDM will be time taking, the thing is, is there any way to run rEDM with multiprocessing
in R? How can I implement it?
thanks!
As I tested it, the warning it seems to not have to do with start libSize
, Its is unclear why I am getting this warning and how to solve it, I ran several start points for the libraries sizes and has not worked, I every time the warning shows the ccm
function returns as results 0 all over the library size
Thanks for the issue submission.
Please report the following:
library(rEDM)
sessionInfo()
Please give an example of the offending function call.
Regarding the question of how to multiprocess with CCM, a set of R functions that do this in various ways using foreach %dopar%
and clusterApply
are shown here.
Please note that since CCM
is multithreaded computing both forward and reverse mappings, the number of cores applied to these functions should be less than 1/2 the number of cores available.
Also note that mclapply
should not be used with multithreaded functions/applications such as CCM Multiview EmbedDimension PredictNonlinear PredictInterval
.
Thanks for the prompt response! I am very busy those days and I couldn't look back on my code, thanks for the help with multiprocessing implementation code too! I promise sometime next week I'll take a look and respond you back.
Hi, I am using rEDM package in R but there are major gaps in our data. Clearly, if there are NAs then the rEDM functions (e.g., EmbedDimension, PredictNonLinear, CCM) are not running or showing errors in prediction.
ccm_out_tau <- CCM(dataFrame = df3, E = 6, Tp = 0, columns = "ET", target = "PET", tau = tau,
libSizes = c(10, 962, 5), sample = 100, replacement = F, showPlot = TRUE)
WARNING: ComputeError(): nan predictions found error not computed
theta.rho1 <- PredictNonlinear( dataFrame=df3, E=6,lib="1 962",theta = "",
pred="1 962", columns="ET", target="ET", showPlot = TRUE)
Error in RtoCpp_PredictNonlinear(pathIn, dataFile, dataFrame, pathOut, :
Lapack_SVD(): dgelss failed.
I'm wondering if you can help me solve this issue.
Thank you.
Thank you for the report.
As we have discussed via email, these are warnings and error-traps since non-sequential data violate a presumption of Takens time-delay embedding theorem (Simplex
, CCM
etc), and, the LAPACK routine dgelss
used to solve the SMap
linear system does not allow NA
.
Accordingly, these are not software bugs, but input data inadequacies.
Some comments on the issue may perhaps be useful:
The first step in EDM (unless one is using a multivariate, non time-delay embedded state-space) is to "recreate" the state-space with time-delay embedding: Takens theorem. It is presumed that data are contiguous in time, as the time-delay operation shifts the time series data vectors successively by τ. I am unsure of the ramifications of violating this presumption, Clairty on this issue is most welcome.
In functions that use Tp != 0
, time to prediction, we are also presuming that the "step" in the state-space between points equates to one time step.
However, if one is not using a time delay embedding but a multivariate data set (embedded = TRUE
), AND, Tp = 0
, then missing data no not violate any specific presumptions, other than the general one that the state space is a complete representation of the system (and presuming the missing data are sparse).
Even so, the Simplex
based routines (CCM
, Multiview
, EmbedDimension
etc) do not prevent NA
, or, use the time vector (dataFrame
first column) for state-space computations, just for tracking observation vectors to report the correct observation time in the predictions. Therefore, NA
are not forbidden, as is easy to demonstrate. A prediction that uses a state-space vector with NA is returned as NA. If statistics are then computed on a prediction vector with NA
: WARNING: ComputeError(): nan predictions found error not computed.
It is up to the user to be aware of the limitations of the data and violations of the underlying presumptions.
I often use statistical filling or interpolation to fill data gaps.
Please note that in CCM
, replacement = TRUE
is not recommended. This introduces a bias since it has the potential to reuse degenerate states in the library used to compute the Simplex result for a specific sample evaluation.
If the number of NA
is small, and only produce a few NA
in the Simplex
output, perhaps you can process CCM
statistics by ignoring NA
in the Predictions. The code simply reports it was unable to compute the statistics since there were NA
in the Predictions, thereby a WARNING
. The Predictions can be returned using includeData = TRUE
in CCM
, in which case the returned object is a list with CCM1_Predictions
, CCM2_Predictions
containing the result of each sample Simplex
prediction. Although, the list is not named to specify what library size was used. However, you can do separate runs with a single libSizes = X
argument for each library size.
Perhaps investigation comparing synthetic time series with/without gaps can provide bounds on the differences.
hi there, another question regarding the tp
parameter in ccm
function, I made use of rEDM package some years ago and that time the ccm
function accepted tp < 0
as entries, now it does not accept, why? This has to do with the new default of the function that runs the maps in both directions simultaneously? How can I ask for a map say X:Y where the target time for prediction it is negative? This is an attempt to investigate how delayed can be the causal relations.
When I try with tp = -1
I get the following message:
Error in RtoCpp_CCM(pathIn, dataFile, dataFrame, pathOut, predictFile, : CrossMap(): Tp = -1 is inconsistent with maximum libSize = 523
This is an error trap expressing that the requested library size exceeds the size of the library that was created.
> library( rEDM )
> sessionInfo()
other attached packages:
[1] rEDM_1.10.2
> df = Lorenz5D
> CCM( dataFrame = df, E = 5, Tp = -1, columns = 'V1', target = 'V5', libSizes = "50 995", sample = 10 )
LibSize V1:V5 V5:V1
1 50 0.6254 0.8019
2 995 0.8468 0.9141
>
> CCM( dataFrame = df, E = 5, Tp = -1, columns = 'V1', target = 'V5', libSizes = "50 1000", sample = 10 )
Error in RtoCpp_CCM(pathIn, dataFile, dataFrame, pathOut, predictFile, :
CrossMap(): Tp = -1 is inconsistent with maximum libSize = 1000
How can I know an optimal LibrarySize
to rapidly determine delayed effects when test tp<0
? Here in my test, I know I own a working example for you, I was using the whole series length for the libSizes
, when I substracted 2 from it, it worked for tp<0
, there is any way to estimate this optimal size, to work for tp>0
and tp<0
?
This depends on whether Tp is positive or negative, and tau, E, and number of observations.
Something like this should work where N is the number of observation rows
A = abs( tau ) * (E-1)
B = (Tp + 1)
Tp < 0 : libSize_max = N - A + B
Tp > 0 : libSize_max = N - A
Thanks for the formulas!
To me the formula that worked was little different:
Tp<0: libSize_max = N - A + B -2
Tp>0: libSize_max = N - A