metaopt/torchopt

[BUG] Separate CPU / CUDA wheels

XuehaiPan opened this issue · 2 comments

Describe the bug

A clear and concise description of what the bug is.

Wheels built with torch==1.12.0+cu116 is incompatible with torch==1.12.0+cpu (see CI output https://github.com/metaopt/TorchOpt/runs/7553102052 for more details). Different torch build ships with different libraries:

torch==1.12.0+cu116:

$ ls $SITE_PACKAGES/torch/lib
total 3.3G
-rwxr-xr-x 1 root root 1.2M Jul 28 04:01 libc10_cuda.so
-rwxr-xr-x 1 root root 751K Jul 28 04:01 libc10.so
-rwxr-xr-x 1 root root  25K Jul 28 04:01 libcaffe2_nvrtc.so
-rwxr-xr-x 1 root root 335M Jul 28 04:01 libcublasLt.so.11
-rwxr-xr-x 1 root root 150M Jul 28 04:01 libcublas.so.11
-rwxr-xr-x 1 root root 668K Jul 28 04:01 libcudart-45da57e3.so.11.0
-rwxr-xr-x 1 root root 124M Jul 28 04:01 libcudnn_adv_infer.so.8
-rwxr-xr-x 1 root root  92M Jul 28 04:01 libcudnn_adv_train.so.8
-rwxr-xr-x 1 root root 774M Jul 28 04:01 libcudnn_cnn_infer.so.8
-rwxr-xr-x 1 root root  85M Jul 28 04:01 libcudnn_cnn_train.so.8
-rwxr-xr-x 1 root root  86M Jul 28 04:01 libcudnn_ops_infer.so.8
-rwxr-xr-x 1 root root  68M Jul 28 04:01 libcudnn_ops_train.so.8
-rwxr-xr-x 1 root root 155K Jul 28 04:01 libcudnn.so.8
-rwxr-xr-x 1 root root 165K Jul 28 04:01 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root  44M Jul 28 04:01 libnvrtc-4dd39364.so.11.2
-rwxr-xr-x 1 root root 6.8M Jul 28 04:01 libnvrtc-builtins.so.11.6
-rwxr-xr-x 1 root root  43K Jul 28 04:01 libnvToolsExt-847d78f2.so.1
-rwxr-xr-x 1 root root  44K Jul 28 04:01 libshm.so
-rwxr-xr-x 1 root root 487M Jul 28 04:01 libtorch_cpu.so
-rwxr-xr-x 1 root root 216M Jul 28 04:01 libtorch_cuda_cpp.so
-rwxr-xr-x 1 root root 653M Jul 28 04:01 libtorch_cuda_cu.so
-rwxr-xr-x 1 root root 209M Jul 28 04:01 libtorch_cuda_linalg.so
-rwxr-xr-x 1 root root 163K Jul 28 04:01 libtorch_cuda.so
-rwxr-xr-x 1 root root  21K Jul 28 04:01 libtorch_global_deps.so
-rwxr-xr-x 1 root root  21M Jul 28 04:01 libtorch_python.so
-rwxr-xr-x 1 root root  16K Jul 28 04:01 libtorch.so

torch==1.12.0+cpu:

$ ls $SITE_PACKAGES/torch/lib
total 496M
-rwxr-xr-x 1 root root 269K Jul 28 04:02 libbackend_with_compiler.so
-rwxr-xr-x 1 root root 766K Jul 28 04:02 libc10.so
-rwxr-xr-x 1 root root 165K Jul 28 04:02 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 228K Jul 28 04:02 libjitbackend_test.so
-rwxr-xr-x 1 root root  35K Jul 28 04:02 libshm.so
-rwxr-xr-x 1 root root 588K Jul 28 04:02 libtorchbind_test.so
-rwxr-xr-x 1 root root 476M Jul 28 04:02 libtorch_cpu.so
-rwxr-xr-x 1 root root 8.6K Jul 28 04:02 libtorch_global_deps.so
-rwxr-xr-x 1 root root  19M Jul 28 04:02 libtorch_python.so
-rwxr-xr-x 1 root root 7.1K Jul 28 04:02 libtorch.so

In our .cxx and .cu code, we only have one include directive #include <torch/extension.h> and only referenced torch::Tensor and AT_DISPATCH_FLOATING_TYPES. But the built shared libraries are linking against too many libraries than expected.

$ ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007ffcd44ea000)
        libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007f75c1243000)
        libc10_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10_cuda.so (0x00007f75c109b000)
        libcaffe2_nvrtc.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcaffe2_nvrtc.so (0x00007f75c123c000)
        libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007f75c1231000)
        libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007f75c122c000)
        libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007f75a704d000)
        libtorch_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda.so (0x00007f75c120c000)
        libtorch_cuda_cpp.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cpp.so (0x00007f7599db0000)
        libtorch_cuda_cu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cu.so (0x00007f757237b000)
        libtorch_cuda_linalg.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_linalg.so (0x00007f7565905000)
        libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007f75c1203000)
        libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007f7564957000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f756474f000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7564533000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f756432f000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f7564027000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f7563d25000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f7563aff000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f75638e9000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f756351b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f75c1199000)
        libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007f75632f1000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00007f7561e96000)
        libnvrtc-4dd39364.so.11.2 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvrtc-4dd39364.so.11.2 (0x00007f755f075000)
        libcudart-45da57e3.so.11.0 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudart-45da57e3.so.11.0 (0x00007f755edcd000)
        libnvToolsExt-847d78f2.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007f755ebc2000)
        libcudnn.so.8 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudnn.so.8 (0x00007f755e99a000)
        libcublas.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublas.so.11 (0x00007f755521c000)
        libcublasLt.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublasLt.so.11 (0x00007f75401b6000)

To Reproduce

Steps to reproduce the behavior.

Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.

Please use the markdown code blocks for both code and stack traces.

See CI output https://github.com/metaopt/TorchOpt/runs/7553102052 for more details.

$ git clone git@github.com:XuehaiPan/TorchOpt.git && cd TorchOpt
$ git checkout cibuildwheel
$ pip3 install --upgrade cibuildwheel
$ PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu116" python3 -m cibuildwheel --platform linux --config-file=pyproject.toml
...
ls /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torch/lib
total 496M
-rwxr-xr-x 1 root root 269K Jul 28 04:02 libbackend_with_compiler.so
-rwxr-xr-x 1 root root 766K Jul 28 04:02 libc10.so
-rwxr-xr-x 1 root root 165K Jul 28 04:02 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 228K Jul 28 04:02 libjitbackend_test.so
-rwxr-xr-x 1 root root  35K Jul 28 04:02 libshm.so
-rwxr-xr-x 1 root root 588K Jul 28 04:02 libtorchbind_test.so
-rwxr-xr-x 1 root root 476M Jul 28 04:02 libtorch_cpu.so
-rwxr-xr-x 1 root root 8.6K Jul 28 04:02 libtorch_global_deps.so
-rwxr-xr-x 1 root root  19M Jul 28 04:02 libtorch_python.so
-rwxr-xr-x 1 root root 7.1K Jul 28 04:02 libtorch.so
ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007ffd5997a000)
        libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007ff16efde000)
        libc10_cuda.so => not found
        libcaffe2_nvrtc.so => not found
        libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007ff16efce000)
        libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007ff16efcb000)
        libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007ff155b9a000)
        libtorch_cuda.so => not found
        libtorch_cuda_cpp.so => not found
        libtorch_cuda_cu.so => not found
        libtorch_cuda_linalg.so => not found
        libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007ff16efc5000)
        libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007ff154dc0000)
        librt.so.1 => /lib64/librt.so.1 (0x00007ff154bb8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff15499c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007ff154798000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff154490000)
        libm.so.6 => /lib64/libm.so.6 (0x00007ff15418e000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007ff153f68000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff153d52000)
        libc.so.6 => /lib64/libc.so.6 (0x00007ff153984000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff16ef39000)
        libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007ff15375a000)
patchelf --print-rpath /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
$ORIGIN/../../torch/lib:$ORIGIN/../../torchopt.libs
make: Entering directory `/project'
/tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip show pytest &>/dev/null || (cd && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip install pytest --upgrade)
/tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip show pytest_cov &>/dev/null || (cd && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip install pytest_cov --upgrade)
/tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip show pytest_xdist &>/dev/null || (cd && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip install pytest_xdist --upgrade)
cd tests && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pytest unit --cov torchopt --durations 0 -v --cov-report term-missing --color=yes
============================= test session starts ==============================
platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /tmp/tmp.3M50Q7bV1d/venv/bin/python3
cachedir: .pytest_cache
rootdir: /project
plugins: forked-1.4.0, cov-3.0.0, xdist-2.5.0
collecting ... collected 0 items / 4 errors

==================================== ERRORS ====================================
___________________ ERROR collecting tests/unit/test_clip.py ___________________
ImportError while importing test module '/project/tests/unit/test_clip.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/test_clip.py:25: in <module>
    ???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
    from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
    from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
    from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
    from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
    from torchopt._lib import adam_op  # pylint: disable=no-name-in-module
E   ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
_________________ ERROR collecting tests/unit/test_schedule.py _________________
ImportError while importing test module '/project/tests/unit/test_schedule.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/test_schedule.py:18: in <module>
    ???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
    from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
    from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
    from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
    from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
    from torchopt._lib import adam_op  # pylint: disable=no-name-in-module
E   ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
______ ERROR collecting tests/unit/high_level/test_high_level_inplace.py _______
ImportError while importing test module '/project/tests/unit/high_level/test_high_level_inplace.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/high_level/test_high_level_inplace.py:25: in <module>
    ???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
    from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
    from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
    from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
    from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
    from torchopt._lib import adam_op  # pylint: disable=no-name-in-module
E   ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
_______ ERROR collecting tests/unit/low_level/test_low_level_inplace.py ________
ImportError while importing test module '/project/tests/unit/low_level/test_low_level_inplace.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/low_level/test_low_level_inplace.py:26: in <module>
    ???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
    from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
    from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
    from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
    from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
    from torchopt._lib import adam_op  # pylint: disable=no-name-in-module
E   ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory

---------- coverage: platform linux, python 3.7.13-final-0 -----------
Name                                                                                                    Stmts   Miss  Cover   Missing
-------------------------------------------------------------------------------------------------------------------------------------
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py                                   9      7    22%   18-26
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/__init__.py                              0      0   100%
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py                              1      0   100%
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py              21     18    14%   23-45
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py       1      0   100%
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py       75     72     4%   25-137
make: *** [pytest] Error 2
-------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                     107     97     9%

=========================== short test summary info ============================
ERROR unit/test_clip.py
ERROR unit/test_schedule.py
ERROR unit/high_level/test_high_level_inplace.py
ERROR unit/low_level/test_low_level_inplace.py
!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!
============================== 4 errors in 0.82s ===============================
...

Shared library compiled with torch==1.12.0+cu116:

$ ls /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torch/lib
total 3.3G
-rwxr-xr-x 1 root root 1.2M Jul 28 04:01 libc10_cuda.so
-rwxr-xr-x 1 root root 751K Jul 28 04:01 libc10.so
-rwxr-xr-x 1 root root  25K Jul 28 04:01 libcaffe2_nvrtc.so
-rwxr-xr-x 1 root root 335M Jul 28 04:01 libcublasLt.so.11
-rwxr-xr-x 1 root root 150M Jul 28 04:01 libcublas.so.11
-rwxr-xr-x 1 root root 668K Jul 28 04:01 libcudart-45da57e3.so.11.0
-rwxr-xr-x 1 root root 124M Jul 28 04:01 libcudnn_adv_infer.so.8
-rwxr-xr-x 1 root root  92M Jul 28 04:01 libcudnn_adv_train.so.8
-rwxr-xr-x 1 root root 774M Jul 28 04:01 libcudnn_cnn_infer.so.8
-rwxr-xr-x 1 root root  85M Jul 28 04:01 libcudnn_cnn_train.so.8
-rwxr-xr-x 1 root root  86M Jul 28 04:01 libcudnn_ops_infer.so.8
-rwxr-xr-x 1 root root  68M Jul 28 04:01 libcudnn_ops_train.so.8
-rwxr-xr-x 1 root root 155K Jul 28 04:01 libcudnn.so.8
-rwxr-xr-x 1 root root 165K Jul 28 04:01 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root  44M Jul 28 04:01 libnvrtc-4dd39364.so.11.2
-rwxr-xr-x 1 root root 6.8M Jul 28 04:01 libnvrtc-builtins.so.11.6
-rwxr-xr-x 1 root root  43K Jul 28 04:01 libnvToolsExt-847d78f2.so.1
-rwxr-xr-x 1 root root  44K Jul 28 04:01 libshm.so
-rwxr-xr-x 1 root root 487M Jul 28 04:01 libtorch_cpu.so
-rwxr-xr-x 1 root root 216M Jul 28 04:01 libtorch_cuda_cpp.so
-rwxr-xr-x 1 root root 653M Jul 28 04:01 libtorch_cuda_cu.so
-rwxr-xr-x 1 root root 209M Jul 28 04:01 libtorch_cuda_linalg.so
-rwxr-xr-x 1 root root 163K Jul 28 04:01 libtorch_cuda.so
-rwxr-xr-x 1 root root  21K Jul 28 04:01 libtorch_global_deps.so
-rwxr-xr-x 1 root root  21M Jul 28 04:01 libtorch_python.so
-rwxr-xr-x 1 root root  16K Jul 28 04:01 libtorch.so
$ ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007ffcd44ea000)
        libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007f75c1243000)
        libc10_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10_cuda.so (0x00007f75c109b000)
        libcaffe2_nvrtc.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcaffe2_nvrtc.so (0x00007f75c123c000)
        libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007f75c1231000)
        libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007f75c122c000)
        libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007f75a704d000)
        libtorch_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda.so (0x00007f75c120c000)
        libtorch_cuda_cpp.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cpp.so (0x00007f7599db0000)
        libtorch_cuda_cu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cu.so (0x00007f757237b000)
        libtorch_cuda_linalg.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_linalg.so (0x00007f7565905000)
        libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007f75c1203000)
        libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007f7564957000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f756474f000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7564533000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f756432f000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f7564027000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f7563d25000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f7563aff000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f75638e9000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f756351b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f75c1199000)
        libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007f75632f1000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00007f7561e96000)
        libnvrtc-4dd39364.so.11.2 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvrtc-4dd39364.so.11.2 (0x00007f755f075000)
        libcudart-45da57e3.so.11.0 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudart-45da57e3.so.11.0 (0x00007f755edcd000)
        libnvToolsExt-847d78f2.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007f755ebc2000)
        libcudnn.so.8 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudnn.so.8 (0x00007f755e99a000)
        libcublas.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublas.so.11 (0x00007f755521c000)
        libcublasLt.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublasLt.so.11 (0x00007f75401b6000)
$ patchelf --print-rpath /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
$ORIGIN/../../torch/lib:$ORIGIN/../../torchopt.libs

Then deploy with torch==1.12.0+cpu:

$ ls /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torch/lib
total 496M
-rwxr-xr-x 1 root root 269K Jul 28 04:02 libbackend_with_compiler.so
-rwxr-xr-x 1 root root 766K Jul 28 04:02 libc10.so
-rwxr-xr-x 1 root root 165K Jul 28 04:02 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 228K Jul 28 04:02 libjitbackend_test.so
-rwxr-xr-x 1 root root  35K Jul 28 04:02 libshm.so
-rwxr-xr-x 1 root root 588K Jul 28 04:02 libtorchbind_test.so
-rwxr-xr-x 1 root root 476M Jul 28 04:02 libtorch_cpu.so
-rwxr-xr-x 1 root root 8.6K Jul 28 04:02 libtorch_global_deps.so
-rwxr-xr-x 1 root root  19M Jul 28 04:02 libtorch_python.so
-rwxr-xr-x 1 root root 7.1K Jul 28 04:02 libtorch.so
$ ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007ffd5997a000)
        libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007ff16efde000)
        libc10_cuda.so => not found
        libcaffe2_nvrtc.so => not found
        libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007ff16efce000)
        libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007ff16efcb000)
        libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007ff155b9a000)
        libtorch_cuda.so => not found
        libtorch_cuda_cpp.so => not found
        libtorch_cuda_cu.so => not found
        libtorch_cuda_linalg.so => not found
        libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007ff16efc5000)
        libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007ff154dc0000)
        librt.so.1 => /lib64/librt.so.1 (0x00007ff154bb8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff15499c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007ff154798000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff154490000)
        libm.so.6 => /lib64/libm.so.6 (0x00007ff15418e000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007ff153f68000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff153d52000)
        libc.so.6 => /lib64/libc.so.6 (0x00007ff153984000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff16ef39000)
        libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007ff15375a000)
$ patchelf --print-rpath /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
$ORIGIN/../../torch/lib:$ORIGIN/../../torchopt.libs

Expected behavior

A clear and concise description of what you expected to happen.

Only link libtorch.so and can import shared libraries with different torch dependencies.

Screenshots

If applicable, add screenshots to help explain your problem.

Test torchopt wheels (built with torch==1.12.0+cu116) with torch==1.12.0+cpu.

image

System info

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, source, ...)
  • Python version
  • Versions of any other relevant libraries
import torchopt, numpy, sys
print(torchopt.__version__, numpy.__version__, sys.version, sys.platform)

N/A

Additional context

Add any other context about the problem here.

N/A

Reason and Possible fixes

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.

N/A

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

In our .cxx and .cu code, we only have one include directive #include <torch/extension.h> and only referenced torch::Tensor and AT_DISPATCH_FLOATING_TYPES. But the built shared libraries are linking against too many libraries than expected.

The unexpected linkage is caused by:

https://github.com/metaopt/TorchOpt/blob/583666157d74d6de6b18be6c4cdc5ad291823e85/CMakeLists.txt#L144-L149

https://github.com/metaopt/TorchOpt/blob/583666157d74d6de6b18be6c4cdc5ad291823e85/src/adam_op/CMakeLists.txt#L47-L50

We glob all .so files in torch/lib and link them with our C++ extension. Fixed by PR #45.

cc @JieRen98