Unable to use all GPUs with Python and `nvidia-mgpu` target
bebora opened this issue · 2 comments
Required prerequisites
- Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
- Make sure you've read the documentation. Your issue may be addressed there.
- Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
- If possible, make a PR with a failing test to give us a starting point to work on!
Describe the bug
The documentation states that the nvidia-mgpu
target can be used to distribute the state vector between available GPUs. The state vector seems to be correctly distributed when using CUDA-Q for C++. However, I'm unable to achieve such a distribution using CUDA-Q for Python.
Steps to reproduce the bug
I'm using the Singularity image on a machine with two Tesla V100S GPUs.
Create a cuda-quantum.def
file with the following content:
Bootstrap: docker
From: nvcr.io/nvidia/quantum/cuda-quantum:0.7.0
%runscript
mount devpts /dev/pts -t devpts
cp -r /home/cudaq/* .
bash
Build the image with singularity build --fakeroot cuda-quantum.sif cuda-quantum.def
Run the image with singularity shell --nv --no-mount hostfs ./cuda-quantum.sif
Create a file cuquantum_backends.py
copying the code from the Python examples.
Create a file cuquantum_backends.cpp
copying the code from the C++ examples.
Open another terminal to observe the output of watch -n 1 nvidia-smi
.
30 qubits
Edit the two files to use 30 qubits (qubit_count = 30
and auto counts = cudaq::sample(/*shots=*/100, ghz{}, 30);
).
Compile and run the C++ version with the following commands:
nvq++ cuquantum_backends.cpp -o nvidia-mgpu.x --target nvidia-mgpu
mpirun -np 2 ./nvidia-mgpu.x
I can see a process for each GPU, peaking at about 8660MiB VRAM each.
Run the Python version with the following command:
mpirun -np 2 python3 cuquantum_backends.py --target=nvidia-mgpu
I can see two processes both running on GPU 0, peaking at 8528MiB VRAM each.
31 qubits
Edit the two files to use 31 qubits (qubit_count = 31
and auto counts = cudaq::sample(/*shots=*/100, ghz{}, 31);
).
Compile and run the C++ version with the following commands:
nvq++ cuquantum_backends.cpp -o nvidia-mgpu.x --target nvidia-mgpu
mpirun -np 2 ./nvidia-mgpu.x
I can see a process for each GPU, peaking at about 16854MiB VRAM each.
Run the Python version with the following command:
mpirun -np 2 python3 cuquantum_backends.py --target=nvidia-mgpu
One process runs using GPU 0, peaking at about 16720MiB VRAM. The other process is terminated with the following error RuntimeError: [custatevec] %out of memory in addQubitsToState (line 210)
.
Expected behavior
I would expect the Python version to distribute the state vector among multiple GPUs, like the C++ version, rather than being limited to the memory of one GPU.
Is this a regression? If it is, put the last known working version (or commit) here.
Not a regression
Environment
- CUDA Quantum version: 0.7.0 (also 0.7.1, with a slightly different memory consumption)
- Python version: 3.10.12
- Operating system: RHEL 9.3
- nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100S-PCIE-32GB Off | 00000000:01:00.0 Off | 0 |
| N/A 33C P0 35W / 250W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla V100S-PCIE-32GB Off | 00000000:C1:00.0 Off | 0 |
| N/A 33C P0 37W / 250W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Suggestions
No response
Hi @bebora - per the docs, the Python interface is not expecting =
in between --target
and nvidia-mgpu
. That is - I think you need to run it like this:
$ mpirun -np 2 python3 examples/python/cuquantum_backends.py --target nvidia-mgpu
Alternatively, you could try putting cudaq.set_target('nvidia-mgpu')
directly in your Python code, too.
That being said - we should probably make the Python command-line interface allow the same syntax as the C++ for the target option.