Torch cluster fails to install correctly on latest GH actions runner image

Question

Torch cluster fails to install correctly on latest GH actions runner image

a-r-j opened this issue 5 months ago · 5 comments

I have a CI pipeline that depends on torch-cluster. Previously it was working well with 1.6.3 and latest torch/PyG versions. However, recently my CI has broken I believe as a result of the latest github actions runner image. I'm having a hard time figuring out the root cause as other than the image all versions should be the same; do you see anything in the release notes that would result in the following error? I don't think this is a bug with torch_cluster per se as it worked just fine prior to the runner version bump.

tests/protein/tensor/test_angles.py:7: in <module>
    from graphein.protein.tensor.angles import (
graphein/protein/tensor/__init__.py:9: in <module>
    from .data import Protein
graphein/protein/tensor/data.py:19: in <module>
    import torch_geometric
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/__init__.py:6: in <module>
    import torch_geometric.datasets
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/datasets/__init__.py:100: in <module>
    from .explainer_dataset import ExplainerDataset
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/datasets/explainer_dataset.py:8: in <module>
    from torch_geometric.explain import Explanation
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/explain/__init__.py:3: in <module>
    from .algorithm import *  # noqa
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/explain/algorithm/__init__.py:1: in <module>
    from .base import ExplainerAlgorithm
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/explain/algorithm/base.py:14: in <module>
    from torch_geometric.nn import MessagePassing
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/nn/__init__.py:5: in <module>
    from .to_hetero_with_bases_transformer import to_hetero_with_bases
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/nn/to_hetero_with_bases_transformer.py:9: in <module>
    from torch_geometric.nn.conv import MessagePassing
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/nn/conv/__init__.py:8: in <module>
    from .gravnet_conv import GravNetConv
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_geometric/nn/conv/gravnet_conv.py:12: in <module>
    from torch_cluster import knn
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_cluster/__init__.py:18: in <module>
    torch.ops.load_library(spec.origin)
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch/_ops.py:852: in load_library
    ctypes.CDLL(path)
/usr/share/miniconda3/envs/test/lib/python3.10/ctypes/__init__.py:374: in __init__
    self._handle = _dlopen(self._name, mode)
E   OSError: /usr/share/miniconda3/envs/test/lib/python3.10/site-packages/torch_cluster/_version_cpu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs

a-r-j commented 5 months ago

Answer 1 · 2024-02-07T19:08:52.000Z

We have just updated the wheels for PyTorch 2.2. Did you recently upgrade to PyTorch 2.2, but still download the wheels for PyTorch 2.1?

Answer 2 · 2024-02-07T19:49:11.000Z

I did recently add PyTorch 2.2 but I was getting failures on 1.13 and 2.1 too which weren't happening before. I managed to resolve it by installing torch + the pyg stack from PyPI instead of conda: a-r-j/graphein@53290a5#diff-d0777657fa3fd81d23aaf7273e58aee453b04e67882517900c56daeef9b3e4c1

Answer 3 · 2024-02-08T07:42:36.000Z

Can you point me to the CI failure when using conda/mamba?

Answer 4 · 2024-02-09T06:38:20.000Z

Mh, you can see in the failed run that it installs the wrong version:

+ pytorch-cluster    1.6.3  py310_torch_2.1.0_cpu  pyg         499kB

I need to figure out why this is happening.