NVIDIA/cuda-quantum

cudaq.observe causes a memory leak

kaimatzu opened this issue · 5 comments

Required prerequisites

  • Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
  • If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

Running cudaq.observe causes a memory leak. I think this is similar to the issue in #1770.

Steps to reproduce the bug

# Utility function for printing memory usage
def print_memory_usage(message):
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    mem_usage_mb = mem_info.rss / (1024 ** 2)  # Convert to MB
    print(f"\r{message}, Memory Usage: {mem_usage_mb:.2f} MB", end="")
# Sample kernel I was using for my model
kernel, thetas = cudaq.make_kernel(list)
qubit_count = 1 # Set to 1 for now
qubits = kernel.qalloc(qubit_count)

# Apply variational gate parameters optimized during training
for i in range(qubit_count):
    kernel.rx(thetas[2 * i], qubits[i])
    kernel.rz(thetas[2 * i + 1], qubits[i])

# Entangle the qubits
for i in range(1, qubit_count):
    kernel.cx(qubits[i], qubits[i - 1])

pauli_z_hamiltonians = [spin.z(i) for i in range(qubit_count)]
for i in range(100000):
    print_memory_usage(f"Iteration {i}")

    hamiltonians = pauli_z_hamiltonians

    # result = cudaq.observe(kernel, spin.z(0), [0.5, 0.5])
    for hamiltonian in hamiltonians:
        cudaq.observe(kernel, hamiltonian, [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])

The memory keeps incrementally increasing without being deallocated.

Output: Iteration 99999, Memory Usage: 1220.81 MB << This keeps increasing slightly every iteration

Memory leak.

print_memory_usage(f"After testing")
Output: After testing, Memory Usage: 1220.81 MB << No deallocation

Expected behavior

I thought that after running observe, used memory would get deallocated. I think this is an issue with the runtime module in general since Issue #1770 also has this problem? I haven't looked at the source code that much yet.

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

  • CUDA Quantum version: CUDA-Q Version 0.7.1 (https://github.com/NVIDIA/cuda-quantum 1f8dd79)
  • Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
  • C++ compiler: g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
  • Operating system: Linux 6.1.85+

Suggestions

No response

Hi, I'm also experiencing this issue when using cudaq.observe() with a parameterized kernel -- quite significantly so with larger circuits. This does not seem to be as much of an issue if the kernel is created using @cudaq.kernel, however, I'm not sure how to replicate a parameterized kernel with that style without having to build the circuit each time by passing in all the args when cudaq.observe() is invoked (and I'd like to avoid this as I'm calling cudaq.observe() constantly for my optimization loop). Are there any recommendations for alternative methods to overcome this issue?

Hi @bharat-thotakura and @kaimatzu - I believe this should be fixed in the latest nightly image now. Would you be willing to docker pull nvcr.io/nvidia/nightly/cuda-quantum:latest and retry your applications? (I'm pretty sure the one that @kaimatzu posted should be fixed or greatly improved now, but I'm not sure about @bharat-thotakura.) If you are able to re-pull the image and you run into other issues, please feel free to open a new GitHub issue or discussion item.

Sure thing! I'll get back to you later for the results @bmhowe23

Hi @bmhowe23, thank you for the fix! After testing with the new image, my parameterized kernel with builder mode now matches the memory usage of the kernel mode circuit, and more importantly, does not compound & increase drastically with every new circuit after calling cudaq.observe() on a previous circuit in a long VQE-style optimization loop. I will let you know if I encounter any further issues!

Hello @bmhowe23. My model which runs a parameterized 4 qubit VQC seems to be working and not blowing up in memory usage anymore. I think the issue is solved. Will let you know if I spot any more issues.