metaopt/torchopt

[BUG] metaopt/optree is not a public repo

vmichals opened this issue · 4 comments

Describe the bug

I have trouble installing optree from pypi on our compute clusters, getting undefined symbol errors. I saw this link to an optree github repo on pypi, so I thought I can compile it from source, but the repo either doesn't exist or is not public.

To Reproduce

Visit https://github.com/metaopt/optree without logging into a collaborator account.

Expected behavior

The repo should be accessible

System info

N/A

Additional context

Our compute cluster uses a module system with customized builds, sometimes requiring us to build packages from source.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

but the repo either doesn't exist or is not public.

Sorry for this. The repo is still under heavy development. We are working on the repo for better code quality (e.g. docs/tests/benchmarks). We will open-source it in a few weeks.

I have trouble installing optree from pypi on our compute clusters, getting undefined symbol errors.

@vmichals Could you elaborate on this? Such as your system version and the traceback. Many thanks. We could provide a post-fix wheel on PyPI if necessary.

Thanks for your quick response!

I understand. I thought it might be for that reason! :)

I'm trying to use torchopt on the beluga cluster of Compute Canada (https://docs.alliancecan.ca/wiki/B%C3%A9luga/en). Python packages are mostly installed from wheels hosted on the cluster itself (I guess for compatibility and optimization reasons).

I received the following error when I tried installing torchopt via pip install torchopt (not available on the cluster, so it downloads the package on pypi):

ERROR: Could not find a version that satisfies the requirement optree>=0.2.0 (from torchopt) (from versions: none)
ERROR: No matching distribution found for optree>=0.2.0

Note the (from versions: none), which is due to the specialized builds in the software stack, I think. A common work-around (which helped for some other packages) is to get the manylinux wheel with the matching python version (in my case optree-0.2.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl and rename it to match version none (in this case optree-0.2.0-py38-none-linux_x86_64.whl). I did that and was able to install torchopt, but got the following traceback, when I tried to import the package:

ImportError                               Traceback (most recent call last)
Cell In [1], line 1
----> 1 import torchopt

File ~/rlvenv/lib/python3.8/site-packages/torchopt/__init__.py:17
      1 # Copyright 2022 MetaOPT Team. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     15 """TorchOpt: a high-performance optimizer library built upon PyTorch."""
---> 17 from torchopt._src import (
     18     accelerated_op_available,
     19     clip,
     20     combine,
     21     hook,
     22     implicit_diff,
     23     linear_solve,
     24     schedule,
     25     visual,
     26 )
     27 from torchopt._src.alias import adam, adamw, rmsprop, sgd
     28 from torchopt._src.clip import clip_grad_norm

File ~/rlvenv/lib/python3.8/site-packages/torchopt/_src/__init__.py:16
      1 # Copyright 2022 MetaOPT Team. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
---> 16 from torchopt._src.accelerated_op import accelerated_op_available

File ~/rlvenv/lib/python3.8/site-packages/torchopt/_src/accelerated_op/__init__.py:20
     16 from typing import Iterable, Optional, Union
     18 import torch
---> 20 from torchopt._src.accelerated_op.adam_op import AdamOp
     23 def accelerated_op_available(
     24     devices: Optional[Union[str, torch.device, Iterable[Union[str, torch.device]]]] = None
     25 ) -> bool:
     26     """Check the availability of accelerated optimizer."""

File ~/rlvenv/lib/python3.8/site-packages/torchopt/_src/accelerated_op/adam_op.py:22
     18 from typing import Any, Optional, Tuple
     20 import torch
---> 22 from torchopt._C import adam_op  # pylint: disable=no-name-in-module
     25 class AdamOp:  # pylint: disable=too-few-public-methods
     26     """Fused accelerated Adam operators."""

ImportError: /home/michals/rlvenv/lib/python3.8/site-packages/torchopt/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

Please let me know, if I can provide further details.

@vmichals

A common work-around (which helped for some other packages) is to get the manylinux wheel with the matching python version (in my case optree-0.2.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl and rename it to match version none (in this case optree-0.2.0-py38-none-linux_x86_64.whl).

I guess you are using CPython not PyPy, right? You need to use optree.0.2.0-cp38-*.whl rather than pp38.

wget https://files.pythonhosted.org/packages/71/f0/219032e7b5c346d68794c3b244302218c56d35bebeb5265275d0996fa76f/optree-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
pip3 install ./optree-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

File ~/rlvenv/lib/python3.8/site-packages/torchopt/_src/accelerated_op/adam_op.py:22
     18 from typing import Any, Optional, Tuple
     20 import torch
---> 22 from torchopt._C import adam_op  # pylint: disable=no-name-in-module
     25 class AdamOp:  # pylint: disable=too-few-public-methods
     26     """Fused accelerated Adam operators."""

ImportError: /home/michals/rlvenv/lib/python3.8/site-packages/torchopt/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

From the traceback, we see it's an error related to torchopt._C rather than optree._C. It is trying to import symbols from libtorch. Have you installed PyTorch in your environment correctly? See torchopt#installation and https://pytorch.org.

pip3 install --upgrade 'torch>=1.12.0' --extra-index-url https://download.pytorch.org/whl/cu116

We have just open-sourced OpTree at metaopt/optree. Thanks for your attention.