Missing rpath links and dependencies for cuda-quantum PyPi package

Question

Missing rpath links and dependencies for cuda-quantum PyPi package

JLHelm opened this issue 3 months ago · 7 comments

Required prerequisites

Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

Hi all,

I'm writing a pybind11 based PyPi package which dynamically links to cudaq libraries. Since these libraries are also PyPi pip installable I will list cuda-quantum as a python requirement and link to its binaries.

Everything almost works and my package can be distributed via pip. However, I notice that the cuda-quantum pypi package doesn't resolve all linkages, and even has some missing libraries:

$ docker run -it --entrypoint="" ubuntu:jammy bash
# apt update && apt install python3-pip python3-venv    
# python3 -m venv /opt/venv/    
# source /opt/venv/bin/activate    
# pip install cuda-quantum
#  for e in `find /opt/venv/lib/python3.10/site-packages/ -type f -name "*nvqir*.so*"` ; do echo "===$e===" ; ldd $e | grep "not found"; done

yields

===/opt/venv/lib/python3.10/site-packages/cuda_quantum.libs/libnvqir-49b52344.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so===
	libcustatevec.so.1 => not found
	libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet.so===
	libcutensornet.so.2 => not found
	libcutensor.so.1 => not found
	libcudart.so.11.0 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp64.so===
	libcustatevec.so.1 => not found
	libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-qpp.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-nvidia-mgpu.so===
	libcustatevec.so.1 => not found
	libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-dm.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet-mps.so===
	libcutensornet.so.2 => not found
	libcutensor.so.1 => not found
	libcudart.so.11.0 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir.so===

So various nvidia libraries are not linking to one another, and some link to cublas, which is not available. ( pip install nvidia-cublas-cu11 makes it available but still not found)

Initially I had similar issues with my own compiled pip-installed binaries. I fixed it by adding to the binary RPATH in the pybind11 CMake:

set_target_properties(pymodule
  PROPERTIES
  BUILD_RPATH "$ORIGIN:$ORIGIN/lib/:$ORIGIN/cuquantum/lib/"
  OUTPUT_NAME module
)

and this worked in my case. However, paths added to my binary's rpath are not searched when resolving cuda-quantum linkages, so i cant fix this at my end.

Also, running

apt install patchelf                                                                                                                     
patchelf --add-rpath '$ORIGIN/../cuquantum/lib/' /opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so        
patchelf --add-rpath '$ORIGIN/../nvidia/cublas/lib/' /opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so

fixed on the above basic jammy image, so the rpath solution really should work

NB I realise the PyPi page recommends setting LD_LIBRARY_PATH but this is typically considered harmful, and I personally dont want to have my users do any more than pip install. It would also point system-wide builds at python packages which is probably not good.

As such, I suggest adding appropriate rpaths to the various libnvqir-<backend>-.so files at build time via CMake. These shouldn't interfere with standard (non pip/PyPi) system-wide install scenarios and will fix PyPi install scenarios.

I also suggest adding nvidia-cublas-cu11 as a dependency

Steps to reproduce the bug

$ docker run -it --entrypoint="" ubuntu:jammy bash
# apt update && apt install python3-pip python3-venv    
# python3 -m venv /opt/venv/    
# source /opt/venv/bin/activate    
# pip install cuda-quantum
#  for e in `find /opt/venv/lib/python3.10/site-packages/ -type f -name "*nvqir*.so*"` ; do echo "===$e===" ; ldd $e | grep "not found

Expected behavior

===/opt/venv/lib/python3.10/site-packages/cuda_quantum.libs/libnvqir-49b52344.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp64.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-qpp.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-nvidia-mgpu.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-dm.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet-mps.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir.so===

(ie linkages resolved))

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

CUDA Quantum version: 0.6.0-0.7.0
Python version:Python 3.10.12
C++ compiler: NA
Operating system: ubuntu:jammy

Suggestions

I suggest adding appropriate rpaths to the various libnvqir-<backend>-.so files at build time via CMake. These shouldn't interfere with standard (non pip/PyPi) system-wide install scenarios and will fix PyPi install scenarios.

I also suggest adding nvidia-cublas-cu11 as a dependency

Answer 1 · 2024-05-02T16:57:58.000Z

Hi @JLHelm - the reason this approach will not work for us right now is because the following PyPi packages do not have aarch64 support.

nvidia-cublas-cu11 (needed by libnvqir-custatevec-*.so)
nvidia-cusolver-cu11 (needed by libnvqir-tensornet-*.so)
nvidia-cuda-runtime-cu11 (needed by libnvqir-tensornet-*.so)

The instructions provided on the cuda-quantum PyPi project were written to work for both x86_64 environments and aarch64 environments.

Answer 2 · 2024-05-02T17:59:37.000Z

Hi @bmhowe23

I see that would block adding the cublas dependency.

Would adding the rpaths cause any problems though? It would simplify the x86_64 installation(?)

Answer 3 · 2024-05-03T12:37:45.000Z

Yes, that's true. And it appears that pip wheels can have architecture-specific dependencies, so that might simplify the x86_64 installation procedure without impacting the aarch64 installation procedure. We'll investigate further using #1602 for now.

Answer 4 · 2024-05-03T14:02:37.000Z

Thanks @bmhowe23. Much appreciated.

Answer 5 · 2024-05-03T22:15:33.000Z

Hi @JLHelm - there is some concern about setting the rpath to a directory outside of our own package. For example, that would not cover the case where the user has preinstalled some pip packages at the system level and cuda-quanum is installed at the user level. E.g.

# This will place the cuQuantum libs in /usr/local/lib/python3.10/dist-packages/cuquantum/lib/
$ sudo pip install cuquantum-cu11

# This will place the CUDA-Q libs in $USER/.local/lib/python3.10/site-packages/lib/, so the $ORIGIN/../cuquantum/lib/ rpath would not work.
$ pip install --user cuda-quantum

The key point is that pip packages need to honor system-level packages, user-level packages, and various virtual environment configurations where some packages may be in one place and other packages are in entirely different places. Relative paths that go outside of a single package cannot support all of those configurations.

So I guess my question is - did you specifically care about the rpath, or did you just want a plain pip install cuda-quantum to work without any conda dependencies? If it's the latter, I think the current changes in #1602 may work for you (and would definitely improve our x86 user experience w/ installation issues), but if it's the former, that may not lend itself to a multi-location package system like pip.

Answer 6 · 2024-05-04T17:17:43.000Z

Hi @bmhowe23,

It's the latter; so long as it installs all requirements and links when it runs I'm very happy.

I like your solution more anyway. I had thought to solve the site-/dist- packages problem (in my code) by just adding additional rpaths, but your method looks more pythonic and deterministic so I'll probably try it for myself!

Thanks.

Answer 7 · 2024-05-16T20:31:32.000Z

Hi @bmhowe23

FYI I have now managed to thoroughly test and implement similar changes via __init__.py in my project and all works well.

Thanks again.