NVIDIA/cuda-quantum

Missing rpath links and dependencies for cuda-quantum PyPi package

JLHelm opened this issue · 7 comments

Required prerequisites

  • Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
  • If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

Hi all,

I'm writing a pybind11 based PyPi package which dynamically links to cudaq libraries. Since these libraries are also PyPi pip installable I will list cuda-quantum as a python requirement and link to its binaries.

Everything almost works and my package can be distributed via pip. However, I notice that the cuda-quantum pypi package doesn't resolve all linkages, and even has some missing libraries:

$ docker run -it --entrypoint="" ubuntu:jammy bash
# apt update && apt install python3-pip python3-venv    
# python3 -m venv /opt/venv/    
# source /opt/venv/bin/activate    
# pip install cuda-quantum
#  for e in `find /opt/venv/lib/python3.10/site-packages/ -type f -name "*nvqir*.so*"` ; do echo "===$e===" ; ldd $e | grep "not found"; done   

yields

===/opt/venv/lib/python3.10/site-packages/cuda_quantum.libs/libnvqir-49b52344.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so===
	libcustatevec.so.1 => not found
	libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet.so===
	libcutensornet.so.2 => not found
	libcutensor.so.1 => not found
	libcudart.so.11.0 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp64.so===
	libcustatevec.so.1 => not found
	libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-qpp.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-nvidia-mgpu.so===
	libcustatevec.so.1 => not found
	libcublas.so.11 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-dm.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet-mps.so===
	libcutensornet.so.2 => not found
	libcutensor.so.1 => not found
	libcudart.so.11.0 => not found
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir.so===

So various nvidia libraries are not linking to one another, and some link to cublas, which is not available. ( pip install nvidia-cublas-cu11 makes it available but still not found)

Initially I had similar issues with my own compiled pip-installed binaries. I fixed it by adding to the binary RPATH in the pybind11 CMake:

set_target_properties(pymodule
  PROPERTIES
  BUILD_RPATH "$ORIGIN:$ORIGIN/lib/:$ORIGIN/cuquantum/lib/"
  OUTPUT_NAME module
)

and this worked in my case. However, paths added to my binary's rpath are not searched when resolving cuda-quantum linkages, so i cant fix this at my end.

Also, running

apt install patchelf                                                                                                                     
patchelf --add-rpath '$ORIGIN/../cuquantum/lib/' /opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so        
patchelf --add-rpath '$ORIGIN/../nvidia/cublas/lib/' /opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so

fixed on the above basic jammy image, so the rpath solution really should work

NB I realise the PyPi page recommends setting LD_LIBRARY_PATH but this is typically considered harmful, and I personally dont want to have my users do any more than pip install. It would also point system-wide builds at python packages which is probably not good.

As such, I suggest adding appropriate rpaths to the various libnvqir-<backend>-.so files at build time via CMake. These shouldn't interfere with standard (non pip/PyPi) system-wide install scenarios and will fix PyPi install scenarios.

I also suggest adding nvidia-cublas-cu11 as a dependency

Steps to reproduce the bug

$ docker run -it --entrypoint="" ubuntu:jammy bash
# apt update && apt install python3-pip python3-venv    
# python3 -m venv /opt/venv/    
# source /opt/venv/bin/activate    
# pip install cuda-quantum
#  for e in `find /opt/venv/lib/python3.10/site-packages/ -type f -name "*nvqir*.so*"` ; do echo "===$e===" ; ldd $e | grep "not found

Expected behavior

===/opt/venv/lib/python3.10/site-packages/cuda_quantum.libs/libnvqir-49b52344.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp32.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-custatevec-fp64.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-qpp.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-nvidia-mgpu.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-dm.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir-tensornet-mps.so===
===/opt/venv/lib/python3.10/site-packages/lib/libnvqir.so===

(ie linkages resolved))

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

  • CUDA Quantum version: 0.6.0-0.7.0
  • Python version:Python 3.10.12
  • C++ compiler: NA
  • Operating system: ubuntu:jammy

Suggestions

I suggest adding appropriate rpaths to the various libnvqir-<backend>-.so files at build time via CMake. These shouldn't interfere with standard (non pip/PyPi) system-wide install scenarios and will fix PyPi install scenarios.

I also suggest adding nvidia-cublas-cu11 as a dependency

Hi @JLHelm - the reason this approach will not work for us right now is because the following PyPi packages do not have aarch64 support.

  • nvidia-cublas-cu11 (needed by libnvqir-custatevec-*.so)
  • nvidia-cusolver-cu11 (needed by libnvqir-tensornet-*.so)
  • nvidia-cuda-runtime-cu11 (needed by libnvqir-tensornet-*.so)

The instructions provided on the cuda-quantum PyPi project were written to work for both x86_64 environments and aarch64 environments.

Hi @bmhowe23

I see that would block adding the cublas dependency.

Would adding the rpaths cause any problems though? It would simplify the x86_64 installation(?)

Yes, that's true. And it appears that pip wheels can have architecture-specific dependencies, so that might simplify the x86_64 installation procedure without impacting the aarch64 installation procedure. We'll investigate further using #1602 for now.

Thanks @bmhowe23. Much appreciated.

Hi @JLHelm - there is some concern about setting the rpath to a directory outside of our own package. For example, that would not cover the case where the user has preinstalled some pip packages at the system level and cuda-quanum is installed at the user level. E.g.

# This will place the cuQuantum libs in /usr/local/lib/python3.10/dist-packages/cuquantum/lib/
$ sudo pip install cuquantum-cu11

# This will place the CUDA-Q libs in $USER/.local/lib/python3.10/site-packages/lib/, so the $ORIGIN/../cuquantum/lib/ rpath would not work.
$ pip install --user cuda-quantum

The key point is that pip packages need to honor system-level packages, user-level packages, and various virtual environment configurations where some packages may be in one place and other packages are in entirely different places. Relative paths that go outside of a single package cannot support all of those configurations.

So I guess my question is - did you specifically care about the rpath, or did you just want a plain pip install cuda-quantum to work without any conda dependencies? If it's the latter, I think the current changes in #1602 may work for you (and would definitely improve our x86 user experience w/ installation issues), but if it's the former, that may not lend itself to a multi-location package system like pip.

Hi @bmhowe23,

It's the latter; so long as it installs all requirements and links when it runs I'm very happy.

I like your solution more anyway. I had thought to solve the site-/dist- packages problem (in my code) by just adding additional rpaths, but your method looks more pythonic and deterministic so I'll probably try it for myself!

Thanks.

Hi @bmhowe23

FYI I have now managed to thoroughly test and implement similar changes via __init__.py in my project and all works well.

Thanks again.