RuntimeError: Lapack_SVD(): dgelss failed
Closed this issue · 12 comments
Im new to python and I have been using pyEDM to carry out ccm and Smap. The ccm step runs well but the smap function gives a run time error for only some species pairs.
RuntimeError: Lapack_SVD(): dgelss failed
My data do not include NAs
smap_sp1 = pyEDM.SMap(dataFrame=df, lib="1 100", pred="1 100", columns=sp1, target=sp2, E=int(best_embed_sp1), theta=int(best_theta_sp1), showPlot=False)
My data structure is as follows, the index contains date values, Sp. columns contain normalized count data
index sp.A. sp.B. sp.C. sp.D. sp. E.
1991-06-10 -0.04 -0.122. -0.064 -0.118 -0.0242
1991-06-19 -0.04 -0.256 -0.064 -0.110 -0.0121
I really need some help to sole this issue..
Thank you for reporting this issue.
Please specify the platform and version.
Version can be shown as:
>>> import pyEDM
>>> pyEDM.__version__
'1.10.2.0'
Did you install from the PyPI pyEDM repository?
This probably means that LAPACK is not installed or in the path.
On Ubuntu you can look for lapack as so:
dpkg -L liblapack3
On Redhat/CentOS, something like this should work:
sudo yum list installed | grep lapack
This seems a bit strange, as a workable linux environment with numpy or scipy would have blas/lapack.
>>> import numpy.distutils.system_info as sysinfo
>>> sysinfo.get_info('lapack')
{'libraries': ['lapack', 'lapack'], 'library_dirs': ['/usr/lib/x86_64-linux-gnu'], 'language': 'f77'}
> ls /usr/lib/x86_64-linux-gnu/liblapack*
/usr/lib/x86_64-linux-gnu/liblapack.a /usr/lib/x86_64-linux-gnu/liblapack.so
/usr/lib/x86_64-linux-gnu/liblapack_pic.a /usr/lib/x86_64-linux-gnu/liblapack.so.3
Good news it isn't an installation problem.
Any NaN's in the data? There are checks in the code to detect them since LAPACK doesn't handle them, but perhaps it is not catching all instances.
It seems the function call dgelss()
from LAPACK
is returning an error, and that raises an error with the message: Lapack_SVD(): dgelss failed.
The returned error code is not reported in the error message. This needs to be fixed.
If the problem is not NaN in the data, the data may be ill-posed. Here is the error code message from LAPACK, which I presume you are engaging INFO > 0
.
INFO is INTEGER
= 0: successful exit
< 0: if INFO = -i, the i-th argument had an illegal value.
> 0: the algorithm for computing the SVD failed to converge;
if INFO = i, i off-diagonal elements of an intermediate
bidiagonal form did not converge to zero.
If that is the case, we can check this with an independent solution code.
It sounds like the data (embedding and target) are producing an ill-conditioned problem for the SVD.
What value of theta
are you using? A theta = 0
applies uniform weight instead of an exponential one.
I wonder if using the Ridge regression solver with regularization would help?
Here's an example you should be able to follow in your code.
>>> from pyEDM import *
>>> from sklearn.linear_model import Ridge
>>> solver = Ridge( alpha = 0.5 )
>>> sm = SMap( dataFrame = sampleData['circle'], lib = "1 100", pred = "101 198", embedded = True, E = 2, theta = 3.14, columns = "x y", target = "x", showPlot = True, solver = solver )
Sorry for taking a long time to respond..
We have been using pyEDM.PredictNonlinear function for computation and got the theta value at maximum rho. theta is not equal to 0
I also tried the ridge regression solver but it did not help me with the occurring error.
But later I removed the theta parameter from the pyEDM.SMap function, then the code ran for all species even though the out put graphs were quite different from when theta is optimized. I wonder what type of theta values could give rise to such error in Lapack_SVD(): dgelss
Thank you for the assistance provided always
Thanks for the feedback.
Using PredictNonlinear()
with no theta
argument will use a default set of theta values:
ThetaValues( { 0.01, 0.1, 0.3, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 6, 7, 8, 9 } )
calling SMap
for each theta. You suggest this works for the data in question.
Calling SMap
with no theta parameter argument defaults to theta = 0
. The global linear map. You suggest this also works for the data in question.
This seems to suggest that something else, not the value of theta, could be the problem?
Perhaps we should look at your function call parameters? I'm willing to look at the data as time allows.
Im truly grateful for your help and feedback. Is there a way I could send you the pipeline code I am using and a sample data set causing the error other than this public forum?
Thanks for the data and code. Almost certainly this is an ill-posed problem for the LAPACK dgelss
solver.
>>> from pyEDM import *
>>> from pandas import read_csv
>>> df = read_csv('TestRun.csv')
>>> df.iloc[:, 1:4].quantile( [0.1, 0.25, 0.5, 0.75, 0.9], 'rows' )
Sp_001 Sp_002
0.10 -0.048507 -0.122024
0.25 -0.048507 -0.122024
0.50 -0.048507 -0.122024
0.75 -0.048507 -0.122024
0.90 -0.048507 -0.122024
You are trying to solve a static problem with constant values. This works with theta = 0 since it is a multiple linear regression with unity weights.
I don't think SMap
is the right tool to use on data with 1600 out of 1700 values as constant.
If you really insist, the computational problem is that the library is full of constant, equidistant points since you only use the first 100 points. Indeed:
>>> SMap( dataFrame = df.iloc[:, 1:4], columns = 'Sp_001', target = 'Sp_002', E = 5,
theta = 2.2, lib = '1 100', pred = '101 105' )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpark/.local/lib/python3.8/site-packages/pyEDM/CoreEDM.py", line 219, in SMap
D = pyBindEDM.SMap( pathIn,
RuntimeError: Lapack_SVD(): dgelss failed.
With a library that covers all the data, there is enough variance to prevent singular explosion, but, I doubt the results are meaningful:
>>> SMap( dataFrame = df.iloc[:, 1:4], columns = 'Sp_001', target = 'Sp_002', E = 5,
theta = 2.2, lib = '1 1700', pred = '101 105' )
{'predictions': index Observations Predictions Pred_Variance
0 1993/07/19 -0.122024 NaN NaN
1 1993/07/27 -0.122024 0.002114 1.016464
2 1993/07/28 -0.122024 0.002114 1.016464
3 1993/08/09 -0.122024 0.002114 1.016464
4 1993/08/18 -0.122024 0.002114 1.016464
5 1993/08/30 -0.122024 0.002114 1.016464,
'coefficients': index C0 ... ∂Sp_001(t-3)/∂Sp_002 ∂Sp_001(t-4)/∂Sp_002
0 1993/07/19 NaN ... NaN NaN
1 1993/07/27 0.002089 ... -0.000101 -0.000101
2 1993/07/28 0.002089 ... -0.000101 -0.000101
3 1993/08/09 0.002089 ... -0.000101 -0.000101
4 1993/08/18 0.002089 ... -0.000101 -0.000101
5 1993/08/30 0.002089 ... -0.000101 -0.000101
[6 rows x 7 columns]}