SugiharaLab/pyEDM

RuntimeError: Lapack_SVD(): dgelss failed

Closed this issue · 12 comments

UhP88 commented

Im new to python and I have been using pyEDM to carry out ccm and Smap. The ccm step runs well but the smap function gives a run time error for only some species pairs.
RuntimeError: Lapack_SVD(): dgelss failed
My data do not include NAs
smap_sp1 = pyEDM.SMap(dataFrame=df, lib="1 100", pred="1 100", columns=sp1, target=sp2, E=int(best_embed_sp1), theta=int(best_theta_sp1), showPlot=False)
My data structure is as follows, the index contains date values, Sp. columns contain normalized count data
index sp.A. sp.B. sp.C. sp.D. sp. E.
1991-06-10 -0.04 -0.122. -0.064 -0.118 -0.0242
1991-06-19 -0.04 -0.256 -0.064 -0.110 -0.0121
I really need some help to sole this issue..

Thank you for reporting this issue.

Please specify the platform and version.

Version can be shown as:

>>> import pyEDM
>>> pyEDM.__version__
'1.10.2.0'

Did you install from the PyPI pyEDM repository?

UhP88 commented

This probably means that LAPACK is not installed or in the path.

On Ubuntu you can look for lapack as so:

dpkg -L liblapack3

On Redhat/CentOS, something like this should work:

sudo yum list installed | grep lapack

This seems a bit strange, as a workable linux environment with numpy or scipy would have blas/lapack.

>>> import numpy.distutils.system_info as sysinfo
>>> sysinfo.get_info('lapack')
{'libraries': ['lapack', 'lapack'], 'library_dirs': ['/usr/lib/x86_64-linux-gnu'], 'language': 'f77'}
> ls /usr/lib/x86_64-linux-gnu/liblapack*
/usr/lib/x86_64-linux-gnu/liblapack.a      /usr/lib/x86_64-linux-gnu/liblapack.so
/usr/lib/x86_64-linux-gnu/liblapack_pic.a  /usr/lib/x86_64-linux-gnu/liblapack.so.3
UhP88 commented

Good news it isn't an installation problem.

Any NaN's in the data? There are checks in the code to detect them since LAPACK doesn't handle them, but perhaps it is not catching all instances.

It seems the function call dgelss() from LAPACK is returning an error, and that raises an error with the message: Lapack_SVD(): dgelss failed. The returned error code is not reported in the error message. This needs to be fixed.

If the problem is not NaN in the data, the data may be ill-posed. Here is the error code message from LAPACK, which I presume you are engaging INFO > 0.

          INFO is INTEGER
          = 0:  successful exit
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  the algorithm for computing the SVD failed to converge;
                if INFO = i, i off-diagonal elements of an intermediate
                bidiagonal form did not converge to zero.

If that is the case, we can check this with an independent solution code.

UhP88 commented

It sounds like the data (embedding and target) are producing an ill-conditioned problem for the SVD.

What value of theta are you using? A theta = 0 applies uniform weight instead of an exponential one.

I wonder if using the Ridge regression solver with regularization would help?

Here's an example you should be able to follow in your code.

>>> from pyEDM import *
>>> from sklearn.linear_model import Ridge
>>> solver = Ridge( alpha = 0.5 )
>>> sm = SMap( dataFrame = sampleData['circle'], lib = "1 100", pred = "101 198", embedded = True, E = 2, theta = 3.14, columns = "x y", target = "x", showPlot = True, solver = solver )
UhP88 commented

Sorry for taking a long time to respond..
We have been using pyEDM.PredictNonlinear function for computation and got the theta value at maximum rho. theta is not equal to 0
I also tried the ridge regression solver but it did not help me with the occurring error.
But later I removed the theta parameter from the pyEDM.SMap function, then the code ran for all species even though the out put graphs were quite different from when theta is optimized. I wonder what type of theta values could give rise to such error in Lapack_SVD(): dgelss

Thank you for the assistance provided always

Thanks for the feedback.

Using PredictNonlinear() with no theta argument will use a default set of theta values:

ThetaValues( { 0.01, 0.1, 0.3, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 6, 7, 8, 9 } )

calling SMap for each theta. You suggest this works for the data in question.

Calling SMap with no theta parameter argument defaults to theta = 0. The global linear map. You suggest this also works for the data in question.

This seems to suggest that something else, not the value of theta, could be the problem?

Perhaps we should look at your function call parameters? I'm willing to look at the data as time allows.

UhP88 commented

Im truly grateful for your help and feedback. Is there a way I could send you the pipeline code I am using and a sample data set causing the error other than this public forum?

Thanks for the data and code. Almost certainly this is an ill-posed problem for the LAPACK dgelss solver.

>>> from pyEDM import *
>>> from pandas import read_csv
>>> df = read_csv('TestRun.csv')
>>> df.iloc[:, 1:4].quantile( [0.1, 0.25, 0.5, 0.75, 0.9], 'rows' )
        Sp_001    Sp_002
0.10 -0.048507 -0.122024
0.25 -0.048507 -0.122024
0.50 -0.048507 -0.122024
0.75 -0.048507 -0.122024
0.90 -0.048507 -0.122024

You are trying to solve a static problem with constant values. This works with theta = 0 since it is a multiple linear regression with unity weights.

I don't think SMap is the right tool to use on data with 1600 out of 1700 values as constant.

If you really insist, the computational problem is that the library is full of constant, equidistant points since you only use the first 100 points. Indeed:

>>> SMap( dataFrame = df.iloc[:, 1:4], columns = 'Sp_001', target = 'Sp_002', E = 5, 
          theta = 2.2, lib = '1 100', pred = '101 105' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpark/.local/lib/python3.8/site-packages/pyEDM/CoreEDM.py", line 219, in SMap
    D = pyBindEDM.SMap( pathIn,
RuntimeError: Lapack_SVD(): dgelss failed.

With a library that covers all the data, there is enough variance to prevent singular explosion, but, I doubt the results are meaningful:

>>> SMap( dataFrame = df.iloc[:, 1:4], columns = 'Sp_001', target = 'Sp_002', E = 5, 
          theta = 2.2, lib = '1 1700', pred = '101 105' )
{'predictions':         index  Observations  Predictions  Pred_Variance
0  1993/07/19     -0.122024          NaN            NaN
1  1993/07/27     -0.122024     0.002114       1.016464
2  1993/07/28     -0.122024     0.002114       1.016464
3  1993/08/09     -0.122024     0.002114       1.016464
4  1993/08/18     -0.122024     0.002114       1.016464
5  1993/08/30     -0.122024     0.002114       1.016464, 

'coefficients':         index        C0  ...  ∂Sp_001(t-3)/∂Sp_002  ∂Sp_001(t-4)/∂Sp_002
0  1993/07/19       NaN  ...                   NaN                   NaN
1  1993/07/27  0.002089  ...             -0.000101             -0.000101
2  1993/07/28  0.002089  ...             -0.000101             -0.000101
3  1993/08/09  0.002089  ...             -0.000101             -0.000101
4  1993/08/18  0.002089  ...             -0.000101             -0.000101
5  1993/08/30  0.002089  ...             -0.000101             -0.000101

[6 rows x 7 columns]}
UhP88 commented