Missing rpath links and dependencies for cuda-quantum PyPi package
JLHelm opened this issue · 7 comments
Required prerequisites
- Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
- Make sure you've read the documentation. Your issue may be addressed there.
- Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
- If possible, make a PR with a failing test to give us a starting point to work on!
Describe the bug
Hi all,
I'm writing a pybind11 based PyPi package which dynamically links to cudaq libraries. Since these libraries are also PyPi pip installable I will list cuda-quantum
as a python requirement and link to its binaries.
Everything almost works and my package can be distributed via pip. However, I notice that the cuda-quantum
pypi package doesn't resolve all linkages, and even has some missing libraries:
$ docker run -it --entrypoint="" ubuntu:jammy bash
# apt update && apt install python3-pip python3-venv
# python3 -m venv /opt/venv/
# source /opt/venv/bin/activate
# pip install cuda-quantum
# for e in `find /opt/venv/lib/python3.10/site-packages/ -type f -name "*nvqir*.so*"` ; do echo "===$e===" ; ldd $e | grep "not found"; done
yields
===/opt/venv/lib/python3.10/site-packages/cuda_quantum.libs/libnvqir-49b52344.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so===
libcustatevec.so.1 => not found
libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet.so===
libcutensornet.so.2 => not found
libcutensor.so.1 => not found
libcudart.so.11.0 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp64.so===
libcustatevec.so.1 => not found
libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-qpp.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-nvidia-mgpu.so===
libcustatevec.so.1 => not found
libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-dm.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet-mps.so===
libcutensornet.so.2 => not found
libcutensor.so.1 => not found
libcudart.so.11.0 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir.so===
So various nvidia libraries are not linking to one another, and some link to cublas, which is not available. ( pip install nvidia-cublas-cu11
makes it available but still not found)
Initially I had similar issues with my own compiled pip-installed binaries. I fixed it by adding to the binary RPATH in the pybind11 CMake:
set_target_properties(pymodule
PROPERTIES
BUILD_RPATH "$ORIGIN:$ORIGIN/lib/:$ORIGIN/cuquantum/lib/"
OUTPUT_NAME module
)
and this worked in my case. However, paths added to my binary's rpath
are not searched when resolving cuda-quantum
linkages, so i cant fix this at my end.
Also, running
apt install patchelf
patchelf --add-rpath '$ORIGIN/../cuquantum/lib/' /opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so
patchelf --add-rpath '$ORIGIN/../nvidia/cublas/lib/' /opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so
fixed on the above basic jammy image, so the rpath solution really should work
NB I realise the PyPi page recommends setting LD_LIBRARY_PATH
but this is typically considered harmful, and I personally dont want to have my users do any more than pip install
. It would also point system-wide builds at python packages which is probably not good.
As such, I suggest adding appropriate rpaths to the various libnvqir-<backend>-.so
files at build time via CMake. These shouldn't interfere with standard (non pip/PyPi) system-wide install scenarios and will fix PyPi install scenarios.
I also suggest adding nvidia-cublas-cu11
as a dependency
Steps to reproduce the bug
$ docker run -it --entrypoint="" ubuntu:jammy bash
# apt update && apt install python3-pip python3-venv
# python3 -m venv /opt/venv/
# source /opt/venv/bin/activate
# pip install cuda-quantum
# for e in `find /opt/venv/lib/python3.10/site-packages/ -type f -name "*nvqir*.so*"` ; do echo "===$e===" ; ldd $e | grep "not found
Expected behavior
===/opt/venv/lib/python3.10/site-packages/cuda_quantum.libs/libnvqir-49b52344.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp64.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-qpp.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-nvidia-mgpu.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-dm.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet-mps.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir.so===
(ie linkages resolved))
Is this a regression? If it is, put the last known working version (or commit) here.
Not a regression
Environment
- CUDA Quantum version: 0.6.0-0.7.0
- Python version:Python 3.10.12
- C++ compiler: NA
- Operating system: ubuntu:jammy
Suggestions
I suggest adding appropriate rpaths to the various libnvqir-<backend>-.so
files at build time via CMake. These shouldn't interfere with standard (non pip/PyPi) system-wide install scenarios and will fix PyPi install scenarios.
I also suggest adding nvidia-cublas-cu11
as a dependency
Hi @JLHelm - the reason this approach will not work for us right now is because the following PyPi packages do not have aarch64 support.
- nvidia-cublas-cu11 (needed by libnvqir-custatevec-*.so)
- nvidia-cusolver-cu11 (needed by libnvqir-tensornet-*.so)
- nvidia-cuda-runtime-cu11 (needed by libnvqir-tensornet-*.so)
The instructions provided on the cuda-quantum PyPi project were written to work for both x86_64 environments and aarch64 environments.
Hi @bmhowe23
I see that would block adding the cublas dependency.
Would adding the rpaths cause any problems though? It would simplify the x86_64 installation(?)
Yes, that's true. And it appears that pip wheels can have architecture-specific dependencies, so that might simplify the x86_64 installation procedure without impacting the aarch64 installation procedure. We'll investigate further using #1602 for now.
Hi @JLHelm - there is some concern about setting the rpath to a directory outside of our own package. For example, that would not cover the case where the user has preinstalled some pip packages at the system level and cuda-quanum is installed at the user level. E.g.
# This will place the cuQuantum libs in /usr/local/lib/python3.10/dist-packages/cuquantum/lib/
$ sudo pip install cuquantum-cu11
# This will place the CUDA-Q libs in $USER/.local/lib/python3.10/site-packages/lib/, so the $ORIGIN/../cuquantum/lib/ rpath would not work.
$ pip install --user cuda-quantum
The key point is that pip packages need to honor system-level packages, user-level packages, and various virtual environment configurations where some packages may be in one place and other packages are in entirely different places. Relative paths that go outside of a single package cannot support all of those configurations.
So I guess my question is - did you specifically care about the rpath, or did you just want a plain pip install cuda-quantum
to work without any conda dependencies? If it's the latter, I think the current changes in #1602 may work for you (and would definitely improve our x86 user experience w/ installation issues), but if it's the former, that may not lend itself to a multi-location package system like pip.
Hi @bmhowe23,
It's the latter; so long as it installs all requirements and links when it runs I'm very happy.
I like your solution more anyway. I had thought to solve the site-/dist- packages problem (in my code) by just adding additional rpaths, but your method looks more pythonic and deterministic so I'll probably try it for myself!
Thanks.