scverse/rapids_singlecell

[BUG] CUSOLVER_STATUS_EXECUTION_FAILED on tl.pca

Closed this issue · 2 comments

Describe the bug
When I try to run rsc.tl.pca on an example dataset it fails with CUSOLVERError: CUSOLVER_STATUS_EXECUTION_FAILED

Stacktrace
---------------------------------------------------------------------------
CUSOLVERError                             Traceback (most recent call last)
File /home/sturmgre/projects/bi-clinbias/bi_907828_1403_mdm2/1403_0001/1403-0001_FlowCytometry/Analysis/_test.qmd:4
      2 sc.pp.normalize_total(adata)
      3 sc.pp.log1p(adata)
----> 4 rsc.tl.pca(adata)

File /cfs/sturmgre/conda/envs/1403-0001_rapids-singlecell-2402/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_pca.py:123, in pca(adata, layer, n_comps, zero_center, random_state, use_highly_variable, chunked, chunk_size)
    121         X = csr_matrix(X)
    122     pca_func = PCA_sparse(n_components=n_comps)
--> 123     X_pca = pca_func.fit_transform(X)
    124 else:
    125     pca_func = PCA(
    126         n_components=n_comps, random_state=random_state, output_type="numpy"
    127     )

File /cfs/sturmgre/conda/envs/1403-0001_rapids-singlecell-2402/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_pca.py:207, in PCA_sparse.fit_transform(self, X, y)
    206 def fit_transform(self, X, y=None):
--> 207     return self.fit(X).transform(X)

File /cfs/sturmgre/conda/envs/1403-0001_rapids-singlecell-2402/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_pca.py:172, in PCA_sparse.fit(self, x)
    168 self.dtype = x.data.dtype
    170 covariance, self.mean_, _ = _cov_sparse(x=x, return_mean=True)
--> 172 self.explained_variance_, self.components_ = cp.linalg.eigh(
    173     covariance, UPLO="U"
    174 )
    176 # NOTE: We reverse the eigen vector and eigen values here
    177 # because cupy provides them in ascending order. Make a copy otherwise
    178 # it is not C_CONTIGUOUS anymore and would error when converting to
    179 # CumlArray
    180 self.explained_variance_ = self.explained_variance_[::-1]

File /cfs/sturmgre/conda/envs/1403-0001_rapids-singlecell-2402/lib/python3.10/site-packages/cupy/linalg/_eigenvalue.py:147, in eigh(a, UPLO)
    145     return w, v
    146 else:
--> 147     return _syevd(a, UPLO, True)

File /cfs/sturmgre/conda/envs/1403-0001_rapids-singlecell-2402/lib/python3.10/site-packages/cupy/linalg/_eigenvalue.py:60, in _syevd(a, UPLO, with_eigen_vector, overwrite_a)
     58     work_device = cupy.empty(work_device_size, 'b')
     59     work_host = numpy.empty(work_host_sizse, 'b')
---> 60     cusolver.xsyevd(
     61         handle, params, jobz, uplo, m, type_v, v.data.ptr, lda,
     62         type_w, w.data.ptr, type_v,
     63         work_device.data.ptr, work_device_size,
     64         work_host.ctypes.data, work_host_sizse, dev_info.data.ptr)
     65 finally:
     66     cusolver.destroyParams(params)

File cupy_backends/cuda/libs/cusolver.pyx:3475, in cupy_backends.cuda.libs.cusolver.xsyevd()

File cupy_backends/cuda/libs/cusolver.pyx:3489, in cupy_backends.cuda.libs.cusolver.xsyevd()

File cupy_backends/cuda/libs/cusolver.pyx:1079, in cupy_backends.cuda.libs.cusolver.check_status()

CUSOLVERError: CUSOLVER_STATUS_EXECUTION_FAILED

Steps/Code to reproduce bug

import scanpy as sc
import rapids_singlecell as rsc

adata = sc.datasets.pbmc3k()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
rsc.tl.pca(adata)

Environment details (please complete the following information):

  • Environment location: bare metal
  • Linux Distro/Architecture: Scientific Linux 7
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:CA:00.0 Off |                  Off |
| 30%   37C    P2              76W / 300W |  22116MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
pip list
Package                   Version         Editable project location
------------------------- --------------- -------------------------------------------------------------------------------------------------------------------------
aiohttp                   3.9.3
aiosignal                 1.3.1
anndata                   0.10.5.post1
anyio                     4.3.0
appdirs                   1.4.4
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
array_api_compat          1.5
arrow                     1.3.0
asciitree                 0.3.3
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
Babel                     2.14.0
beautifulsoup4            4.12.3
bleach                    6.1.0
bokeh                     3.3.4
branca                    0.7.1
Brotli                    1.1.0
cached-property           1.5.2
cachetools                5.3.3
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
click-plugins             1.1.1
cligj                     0.7.2
cloudpickle               3.0.0
colorama                  0.4.6
colorcet                  3.1.0
comm                      0.2.1
confluent-kafka           1.9.2
contourpy                 1.2.0
cubinlinker               0.3.0
cucim                     24.2.0
cuda-python               11.8.3
cudf                      24.2.2
cudf_kafka                24.2.2
cugraph                   24.2.0
cuml                      24.2.0
cuproj                    24.2.0
cupy                      12.2.0
cuspatial                 24.2.0
custreamz                 24.2.2
cuxfilter                 24.2.0
cycler                    0.12.1
cytoolz                   0.12.3
dask                      2024.1.1
dask-cuda                 24.2.0
dask-cudf                 24.2.2
datashader                0.16.0
debugpy                   1.8.1
decorator                 5.1.1
decoupler                 1.6.0
defusedxml                0.7.1
distributed               2024.1.1
docrep                    0.3.2
entrypoints               0.4
et-xmlfile                1.1.0
exceptiongroup            1.2.0
executing                 2.0.1
fa2                       0.3.5
fasteners                 0.19
fastjsonschema            2.19.1
fastrlock                 0.8.2
fcsparser                 0.2.8
filelock                  3.13.1
fiona                     1.9.5
folium                    0.16.0
fonttools                 4.49.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.2.0
GDAL                      3.8.1
gdown                     5.1.0
geopandas                 0.14.3
h11                       0.14.0
h2                        4.1.0
h5py                      3.10.0
harmonypy                 0.0.9
holoviews                 1.18.3
hpack                     4.0.0
httpcore                  1.0.4
httpx                     0.27.0
hyperframe                6.0.1
idna                      3.6
igraph                    0.11.4
imagecodecs               2024.1.1
imageio                   2.34.0
importlib_metadata        7.0.2
importlib_resources       6.1.3
inflect                   7.0.0
ipykernel                 6.29.3
ipylab                    1.0.0
ipython                   8.22.2
ipywidgets                8.1.2
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
joblib                    1.3.2
json5                     0.9.22
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter_client            8.6.0
jupyter_core              5.7.1
jupyter-events            0.9.0
jupyter-lsp               2.2.4
jupyter_server            2.13.0
jupyter_server_proxy      4.1.0
jupyter_server_terminals  0.5.2
jupyterlab                4.1.4
jupyterlab_pygments       0.3.0
jupyterlab_server         2.25.3
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
lamin_utils               0.13.0
lazy_loader               0.3
legacy-api-wrap           1.4
leidenalg                 0.10.2
linkify-it-py             2.0.3
llvmlite                  0.42.0
locket                    1.0.0
louvain                   0.8.1
lz4                       4.3.3
mapclassify               2.6.1
Markdown                  3.5.2
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.8.3
matplotlib-inline         0.1.6
mdit-py-plugins           0.4.0
mdurl                     0.1.2
mistune                   3.0.2
msgpack                   1.0.7
mudata                    0.2.3
multidict                 6.0.5
multipledispatch          0.6.0
munkres                   1.1.4
muon                      0.1.5
natsort                   8.4.0
nbclient                  0.8.0
nbconvert                 7.16.2
nbformat                  5.9.2
nbproject                 0.10.1
nest_asyncio              1.6.0
networkx                  3.2.1
notebook                  7.1.1
notebook_shim             0.2.4
numba                     0.59.0
numcodecs                 0.12.1
numpy                     1.24.4
nvtx                      0.2.10
omnipath                  1.0.8
openpyxl                  3.1.2
orjson                    3.9.15
overrides                 7.7.0
packaging                 24.0
pandas                    1.5.3
pandocfilters             1.5.0
panel                     1.3.8
param                     2.0.2
parso                     0.8.3
partd                     1.4.1
patsy                     0.5.6
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    10.2.0
pip                       24.0
pkgutil_resolve_name      1.3.10
platformdirs              4.2.0
prometheus_client         0.20.0
prompt-toolkit            3.0.42
protobuf                  4.25.3
psutil                    5.9.8
ptxcompiler               0.8.1
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   14.0.2
pyarrow-hotfix            0.6
pycparser                 2.21
pyct                      0.5.0
pydantic                  1.10.14
pyee                      8.1.0
Pygments                  2.17.2
pylibcugraph              24.2.0
pylibraft                 24.2.0
pynndescent               0.5.11
pynvml                    11.4.1
pyparsing                 3.1.2
pyppeteer                 1.0.2
pyproj                    3.6.1
PySocks                   1.7.1
python-dateutil           2.9.0
python-json-logger        2.0.7
pytometry                 0.1.4
pytz                      2024.1
pyviz_comms               3.0.1
PyWavelets                1.4.1
PyYAML                    6.0.1
pyzmq                     25.1.2
raft-dask                 24.2.0
rapids_singlecell         0.9.6
readfcs                   1.1.7
referencing               0.33.0
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
rmm                       24.2.0
rpds-py                   0.18.0
Rtree                     1.2.0
scanpy                    1.10.0rc2
scikit-image              0.22.0
scikit-learn              1.4.1.post1
scikit-misc               0.3.1
scipy                     1.12.0
seaborn                   0.13.2
Send2Trash                1.8.2
session-info              1.0.0
setuptools                69.1.1
shapely                   2.0.3
simpervisor               1.0.0
single_cell_helper        0.0.1           /home/sturmgre/projects/bi-clinbias/bi_907828_1403_mdm2/1403_0001/1403-0001_FlowCytometry/Analysis/lib/single-cell-helper
six                       1.16.0
sniffio                   1.3.1
sortedcontainers          2.4.0
soupsieve                 2.5
stack-data                0.6.2
statsmodels               0.14.1
stdlib-list               0.10.0
streamz                   0.6.4
tblib                     3.0.0
terminado                 0.18.0
texttable                 1.7.0
threadpoolctl             3.3.0
tifffile                  2024.2.12
tinycss2                  1.2.1
tomli                     2.0.1
toolz                     0.12.1
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.1
treelite                  4.0.0
types-python-dateutil     2.8.19.20240311
typing_extensions         4.10.0
typing-utils              0.1.0
uc-micro-py               1.0.3
ucx-py                    0.36.0
umap-learn                0.5.5
unicodedata2              15.1.0
uri-template              1.3.0
urllib3                   1.26.18
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
websockets                10.4
wget                      3.2
wheel                     0.42.0
widgetsnbextension        4.0.10
wrapt                     1.16.0
xarray                    2024.2.0
xgboost                   2.0.3
xyzservices               2023.10.1
yarl                      1.9.4
zarr                      2.17.1
zict                      3.0.0
zipp                      3.17.0

Additional context
Add any other context about the problem here.

@grst I did some quick testing. cuml doesn't use an svd-solver for sparse matrices but uses the gram-matrix method. This leads to issues with the eigenvalue calculation when a features is 0.

import scanpy as sc
import rapids_singlecell as rsc

adata = sc.datasets.pbmc3k().copy().copy()
sc.pp.filter_genes(adata, min_cells=1)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

This would work. I can make a qc run before use sparse pca and throw out all features that are empty for the purpose of the computation or just error make a better error?

Thanks, I think a better error message pointing to filter_genes would make sense!

Personally, I don't like it when functions just do some magic without me knowing