[BUG] Separate CPU / CUDA wheels
XuehaiPan opened this issue · 2 comments
Describe the bug
A clear and concise description of what the bug is.
Wheels built with torch==1.12.0+cu116
is incompatible with torch==1.12.0+cpu
(see CI output https://github.com/metaopt/TorchOpt/runs/7553102052 for more details). Different torch
build ships with different libraries:
torch==1.12.0+cu116
:
$ ls $SITE_PACKAGES/torch/lib
total 3.3G
-rwxr-xr-x 1 root root 1.2M Jul 28 04:01 libc10_cuda.so
-rwxr-xr-x 1 root root 751K Jul 28 04:01 libc10.so
-rwxr-xr-x 1 root root 25K Jul 28 04:01 libcaffe2_nvrtc.so
-rwxr-xr-x 1 root root 335M Jul 28 04:01 libcublasLt.so.11
-rwxr-xr-x 1 root root 150M Jul 28 04:01 libcublas.so.11
-rwxr-xr-x 1 root root 668K Jul 28 04:01 libcudart-45da57e3.so.11.0
-rwxr-xr-x 1 root root 124M Jul 28 04:01 libcudnn_adv_infer.so.8
-rwxr-xr-x 1 root root 92M Jul 28 04:01 libcudnn_adv_train.so.8
-rwxr-xr-x 1 root root 774M Jul 28 04:01 libcudnn_cnn_infer.so.8
-rwxr-xr-x 1 root root 85M Jul 28 04:01 libcudnn_cnn_train.so.8
-rwxr-xr-x 1 root root 86M Jul 28 04:01 libcudnn_ops_infer.so.8
-rwxr-xr-x 1 root root 68M Jul 28 04:01 libcudnn_ops_train.so.8
-rwxr-xr-x 1 root root 155K Jul 28 04:01 libcudnn.so.8
-rwxr-xr-x 1 root root 165K Jul 28 04:01 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 44M Jul 28 04:01 libnvrtc-4dd39364.so.11.2
-rwxr-xr-x 1 root root 6.8M Jul 28 04:01 libnvrtc-builtins.so.11.6
-rwxr-xr-x 1 root root 43K Jul 28 04:01 libnvToolsExt-847d78f2.so.1
-rwxr-xr-x 1 root root 44K Jul 28 04:01 libshm.so
-rwxr-xr-x 1 root root 487M Jul 28 04:01 libtorch_cpu.so
-rwxr-xr-x 1 root root 216M Jul 28 04:01 libtorch_cuda_cpp.so
-rwxr-xr-x 1 root root 653M Jul 28 04:01 libtorch_cuda_cu.so
-rwxr-xr-x 1 root root 209M Jul 28 04:01 libtorch_cuda_linalg.so
-rwxr-xr-x 1 root root 163K Jul 28 04:01 libtorch_cuda.so
-rwxr-xr-x 1 root root 21K Jul 28 04:01 libtorch_global_deps.so
-rwxr-xr-x 1 root root 21M Jul 28 04:01 libtorch_python.so
-rwxr-xr-x 1 root root 16K Jul 28 04:01 libtorch.so
torch==1.12.0+cpu
:
$ ls $SITE_PACKAGES/torch/lib
total 496M
-rwxr-xr-x 1 root root 269K Jul 28 04:02 libbackend_with_compiler.so
-rwxr-xr-x 1 root root 766K Jul 28 04:02 libc10.so
-rwxr-xr-x 1 root root 165K Jul 28 04:02 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 228K Jul 28 04:02 libjitbackend_test.so
-rwxr-xr-x 1 root root 35K Jul 28 04:02 libshm.so
-rwxr-xr-x 1 root root 588K Jul 28 04:02 libtorchbind_test.so
-rwxr-xr-x 1 root root 476M Jul 28 04:02 libtorch_cpu.so
-rwxr-xr-x 1 root root 8.6K Jul 28 04:02 libtorch_global_deps.so
-rwxr-xr-x 1 root root 19M Jul 28 04:02 libtorch_python.so
-rwxr-xr-x 1 root root 7.1K Jul 28 04:02 libtorch.so
In our .cxx
and .cu
code, we only have one include directive #include <torch/extension.h>
and only referenced torch::Tensor
and AT_DISPATCH_FLOATING_TYPES
. But the built shared libraries are linking against too many libraries than expected.
$ ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffcd44ea000)
libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007f75c1243000)
libc10_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10_cuda.so (0x00007f75c109b000)
libcaffe2_nvrtc.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcaffe2_nvrtc.so (0x00007f75c123c000)
libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007f75c1231000)
libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007f75c122c000)
libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007f75a704d000)
libtorch_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda.so (0x00007f75c120c000)
libtorch_cuda_cpp.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cpp.so (0x00007f7599db0000)
libtorch_cuda_cu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cu.so (0x00007f757237b000)
libtorch_cuda_linalg.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_linalg.so (0x00007f7565905000)
libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007f75c1203000)
libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007f7564957000)
librt.so.1 => /lib64/librt.so.1 (0x00007f756474f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7564533000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f756432f000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f7564027000)
libm.so.6 => /lib64/libm.so.6 (0x00007f7563d25000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f7563aff000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f75638e9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f756351b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f75c1199000)
libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007f75632f1000)
libcuda.so.1 => /lib64/libcuda.so.1 (0x00007f7561e96000)
libnvrtc-4dd39364.so.11.2 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvrtc-4dd39364.so.11.2 (0x00007f755f075000)
libcudart-45da57e3.so.11.0 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudart-45da57e3.so.11.0 (0x00007f755edcd000)
libnvToolsExt-847d78f2.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007f755ebc2000)
libcudnn.so.8 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudnn.so.8 (0x00007f755e99a000)
libcublas.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublas.so.11 (0x00007f755521c000)
libcublasLt.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublasLt.so.11 (0x00007f75401b6000)
To Reproduce
Steps to reproduce the behavior.
Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.
Please use the markdown code blocks for both code and stack traces.
See CI output https://github.com/metaopt/TorchOpt/runs/7553102052 for more details.
$ git clone git@github.com:XuehaiPan/TorchOpt.git && cd TorchOpt
$ git checkout cibuildwheel
$ pip3 install --upgrade cibuildwheel
$ PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu116" python3 -m cibuildwheel --platform linux --config-file=pyproject.toml
...
ls /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torch/lib
total 496M
-rwxr-xr-x 1 root root 269K Jul 28 04:02 libbackend_with_compiler.so
-rwxr-xr-x 1 root root 766K Jul 28 04:02 libc10.so
-rwxr-xr-x 1 root root 165K Jul 28 04:02 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 228K Jul 28 04:02 libjitbackend_test.so
-rwxr-xr-x 1 root root 35K Jul 28 04:02 libshm.so
-rwxr-xr-x 1 root root 588K Jul 28 04:02 libtorchbind_test.so
-rwxr-xr-x 1 root root 476M Jul 28 04:02 libtorch_cpu.so
-rwxr-xr-x 1 root root 8.6K Jul 28 04:02 libtorch_global_deps.so
-rwxr-xr-x 1 root root 19M Jul 28 04:02 libtorch_python.so
-rwxr-xr-x 1 root root 7.1K Jul 28 04:02 libtorch.so
ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffd5997a000)
libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007ff16efde000)
libc10_cuda.so => not found
libcaffe2_nvrtc.so => not found
libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007ff16efce000)
libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007ff16efcb000)
libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007ff155b9a000)
libtorch_cuda.so => not found
libtorch_cuda_cpp.so => not found
libtorch_cuda_cu.so => not found
libtorch_cuda_linalg.so => not found
libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007ff16efc5000)
libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007ff154dc0000)
librt.so.1 => /lib64/librt.so.1 (0x00007ff154bb8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff15499c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff154798000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff154490000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff15418e000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007ff153f68000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff153d52000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff153984000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff16ef39000)
libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007ff15375a000)
patchelf --print-rpath /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
$ORIGIN/../../torch/lib:$ORIGIN/../../torchopt.libs
make: Entering directory `/project'
/tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip show pytest &>/dev/null || (cd && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip install pytest --upgrade)
/tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip show pytest_cov &>/dev/null || (cd && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip install pytest_cov --upgrade)
/tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip show pytest_xdist &>/dev/null || (cd && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pip install pytest_xdist --upgrade)
cd tests && /tmp/tmp.3M50Q7bV1d/venv/bin/python3 -m pytest unit --cov torchopt --durations 0 -v --cov-report term-missing --color=yes
============================= test session starts ==============================
platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /tmp/tmp.3M50Q7bV1d/venv/bin/python3
cachedir: .pytest_cache
rootdir: /project
plugins: forked-1.4.0, cov-3.0.0, xdist-2.5.0
collecting ... collected 0 items / 4 errors
==================================== ERRORS ====================================
___________________ ERROR collecting tests/unit/test_clip.py ___________________
ImportError while importing test module '/project/tests/unit/test_clip.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/test_clip.py:25: in <module>
???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
from torchopt._lib import adam_op # pylint: disable=no-name-in-module
E ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
_________________ ERROR collecting tests/unit/test_schedule.py _________________
ImportError while importing test module '/project/tests/unit/test_schedule.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/test_schedule.py:18: in <module>
???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
from torchopt._lib import adam_op # pylint: disable=no-name-in-module
E ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
______ ERROR collecting tests/unit/high_level/test_high_level_inplace.py _______
ImportError while importing test module '/project/tests/unit/high_level/test_high_level_inplace.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/high_level/test_high_level_inplace.py:25: in <module>
???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
from torchopt._lib import adam_op # pylint: disable=no-name-in-module
E ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
_______ ERROR collecting tests/unit/low_level/test_low_level_inplace.py ________
ImportError while importing test module '/project/tests/unit/low_level/test_low_level_inplace.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/python/cp37-cp37m/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
/workspace/tests/unit/low_level/test_low_level_inplace.py:26: in <module>
???
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py:17: in <module>
from torchopt._src import accelerated_op_available, clip, combine, hook, schedule, visual
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py:16: in <module>
from torchopt._src.accelerated_op import accelerated_op_available
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py:20: in <module>
from torchopt._src.accelerated_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py:16: in <module>
from torchopt._src.accelerated_op.adam_op.adam_op import AdamOp
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py:22: in <module>
from torchopt._lib import adam_op # pylint: disable=no-name-in-module
E ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory
---------- coverage: platform linux, python 3.7.13-final-0 -----------
Name Stmts Miss Cover Missing
-------------------------------------------------------------------------------------------------------------------------------------
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/__init__.py 9 7 22% 18-26
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/__init__.py 0 0 100%
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/__init__.py 1 0 100%
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/__init__.py 21 18 14% 23-45
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/__init__.py 1 0 100%
/tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_src/accelerated_op/adam_op/adam_op.py 75 72 4% 25-137
make: *** [pytest] Error 2
-------------------------------------------------------------------------------------------------------------------------------------
TOTAL 107 97 9%
=========================== short test summary info ============================
ERROR unit/test_clip.py
ERROR unit/test_schedule.py
ERROR unit/high_level/test_high_level_inplace.py
ERROR unit/low_level/test_low_level_inplace.py
!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!
============================== 4 errors in 0.82s ===============================
...
Shared library compiled with torch==1.12.0+cu116
:
$ ls /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torch/lib
total 3.3G
-rwxr-xr-x 1 root root 1.2M Jul 28 04:01 libc10_cuda.so
-rwxr-xr-x 1 root root 751K Jul 28 04:01 libc10.so
-rwxr-xr-x 1 root root 25K Jul 28 04:01 libcaffe2_nvrtc.so
-rwxr-xr-x 1 root root 335M Jul 28 04:01 libcublasLt.so.11
-rwxr-xr-x 1 root root 150M Jul 28 04:01 libcublas.so.11
-rwxr-xr-x 1 root root 668K Jul 28 04:01 libcudart-45da57e3.so.11.0
-rwxr-xr-x 1 root root 124M Jul 28 04:01 libcudnn_adv_infer.so.8
-rwxr-xr-x 1 root root 92M Jul 28 04:01 libcudnn_adv_train.so.8
-rwxr-xr-x 1 root root 774M Jul 28 04:01 libcudnn_cnn_infer.so.8
-rwxr-xr-x 1 root root 85M Jul 28 04:01 libcudnn_cnn_train.so.8
-rwxr-xr-x 1 root root 86M Jul 28 04:01 libcudnn_ops_infer.so.8
-rwxr-xr-x 1 root root 68M Jul 28 04:01 libcudnn_ops_train.so.8
-rwxr-xr-x 1 root root 155K Jul 28 04:01 libcudnn.so.8
-rwxr-xr-x 1 root root 165K Jul 28 04:01 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 44M Jul 28 04:01 libnvrtc-4dd39364.so.11.2
-rwxr-xr-x 1 root root 6.8M Jul 28 04:01 libnvrtc-builtins.so.11.6
-rwxr-xr-x 1 root root 43K Jul 28 04:01 libnvToolsExt-847d78f2.so.1
-rwxr-xr-x 1 root root 44K Jul 28 04:01 libshm.so
-rwxr-xr-x 1 root root 487M Jul 28 04:01 libtorch_cpu.so
-rwxr-xr-x 1 root root 216M Jul 28 04:01 libtorch_cuda_cpp.so
-rwxr-xr-x 1 root root 653M Jul 28 04:01 libtorch_cuda_cu.so
-rwxr-xr-x 1 root root 209M Jul 28 04:01 libtorch_cuda_linalg.so
-rwxr-xr-x 1 root root 163K Jul 28 04:01 libtorch_cuda.so
-rwxr-xr-x 1 root root 21K Jul 28 04:01 libtorch_global_deps.so
-rwxr-xr-x 1 root root 21M Jul 28 04:01 libtorch_python.so
-rwxr-xr-x 1 root root 16K Jul 28 04:01 libtorch.so
$ ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffcd44ea000)
libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007f75c1243000)
libc10_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10_cuda.so (0x00007f75c109b000)
libcaffe2_nvrtc.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcaffe2_nvrtc.so (0x00007f75c123c000)
libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007f75c1231000)
libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007f75c122c000)
libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007f75a704d000)
libtorch_cuda.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda.so (0x00007f75c120c000)
libtorch_cuda_cpp.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cpp.so (0x00007f7599db0000)
libtorch_cuda_cu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_cu.so (0x00007f757237b000)
libtorch_cuda_linalg.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cuda_linalg.so (0x00007f7565905000)
libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007f75c1203000)
libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007f7564957000)
librt.so.1 => /lib64/librt.so.1 (0x00007f756474f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7564533000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f756432f000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f7564027000)
libm.so.6 => /lib64/libm.so.6 (0x00007f7563d25000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f7563aff000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f75638e9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f756351b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f75c1199000)
libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007f75632f1000)
libcuda.so.1 => /lib64/libcuda.so.1 (0x00007f7561e96000)
libnvrtc-4dd39364.so.11.2 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvrtc-4dd39364.so.11.2 (0x00007f755f075000)
libcudart-45da57e3.so.11.0 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudart-45da57e3.so.11.0 (0x00007f755edcd000)
libnvToolsExt-847d78f2.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007f755ebc2000)
libcudnn.so.8 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcudnn.so.8 (0x00007f755e99a000)
libcublas.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublas.so.11 (0x00007f755521c000)
libcublasLt.so.11 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libcublasLt.so.11 (0x00007f75401b6000)
$ patchelf --print-rpath /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
$ORIGIN/../../torch/lib:$ORIGIN/../../torchopt.libs
Then deploy with torch==1.12.0+cpu
:
$ ls /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torch/lib
total 496M
-rwxr-xr-x 1 root root 269K Jul 28 04:02 libbackend_with_compiler.so
-rwxr-xr-x 1 root root 766K Jul 28 04:02 libc10.so
-rwxr-xr-x 1 root root 165K Jul 28 04:02 libgomp-a34b3233.so.1
-rwxr-xr-x 1 root root 228K Jul 28 04:02 libjitbackend_test.so
-rwxr-xr-x 1 root root 35K Jul 28 04:02 libshm.so
-rwxr-xr-x 1 root root 588K Jul 28 04:02 libtorchbind_test.so
-rwxr-xr-x 1 root root 476M Jul 28 04:02 libtorch_cpu.so
-rwxr-xr-x 1 root root 8.6K Jul 28 04:02 libtorch_global_deps.so
-rwxr-xr-x 1 root root 19M Jul 28 04:02 libtorch_python.so
-rwxr-xr-x 1 root root 7.1K Jul 28 04:02 libtorch.so
$ ldd /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffd5997a000)
libc10.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libc10.so (0x00007ff16efde000)
libc10_cuda.so => not found
libcaffe2_nvrtc.so => not found
libshm.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libshm.so (0x00007ff16efce000)
libtorch.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch.so (0x00007ff16efcb000)
libtorch_cpu.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_cpu.so (0x00007ff155b9a000)
libtorch_cuda.so => not found
libtorch_cuda_cpp.so => not found
libtorch_cuda_cu.so => not found
libtorch_cuda_linalg.so => not found
libtorch_global_deps.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_global_deps.so (0x00007ff16efc5000)
libtorch_python.so => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libtorch_python.so (0x00007ff154dc0000)
librt.so.1 => /lib64/librt.so.1 (0x00007ff154bb8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff15499c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff154798000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff154490000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff15418e000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007ff153f68000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff153d52000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff153984000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff16ef39000)
libgomp-a34b3233.so.1 => /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/../../torch/lib/libgomp-a34b3233.so.1 (0x00007ff15375a000)
$ patchelf --print-rpath /tmp/tmp.3M50Q7bV1d/venv/lib/python3.7/site-packages/torchopt/_lib/adam_op.cpython-37m-x86_64-linux-gnu.so
$ORIGIN/../../torch/lib:$ORIGIN/../../torchopt.libs
Expected behavior
A clear and concise description of what you expected to happen.
Only link libtorch.so
and can import shared libraries with different torch
dependencies.
Screenshots
If applicable, add screenshots to help explain your problem.
Test torchopt
wheels (built with torch==1.12.0+cu116
) with torch==1.12.0+cpu
.
System info
Describe the characteristic of your environment:
- Describe how the library was installed (pip, source, ...)
- Python version
- Versions of any other relevant libraries
import torchopt, numpy, sys
print(torchopt.__version__, numpy.__version__, sys.version, sys.platform)
N/A
Additional context
Add any other context about the problem here.
N/A
Reason and Possible fixes
If you know or suspect the reason for this bug, paste the code lines and suggest modifications.
N/A
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have read the documentation (required)
- I have provided a minimal working example to reproduce the bug (required)
I created a discussion on the PyTorch forum https://discuss.pytorch.org/t/how-to-build-a-c-cuda-extension-capable-with-different-pytorch-e-g-cpu-cu102-cu116/157681.
In our
.cxx
and.cu
code, we only have one include directive#include <torch/extension.h>
and only referencedtorch::Tensor
andAT_DISPATCH_FLOATING_TYPES
. But the built shared libraries are linking against too many libraries than expected.
The unexpected linkage is caused by:
We glob all .so
files in torch/lib
and link them with our C++ extension. Fixed by PR #45.
cc @JieRen98