NVIDIA/cuda-quantum

[Benchmarking] Kernel resource estimation to calculate gate counts and depth

anthony-santana opened this issue · 1 comments

Required prerequisites

  • Search the issue tracker to check if your feature has already been mentioned or rejected in other issues.

Describe the feature

We have a need for benchmarking the actual synthesized circuit that's being spit out by our compiler. Metrics of interest are:

  1. The number of each gate (could have something like a GateCounts dictionary type to store this)
  2. The depth of the circuit

(1) Seems straightforward -- just counting the number of each op in the MLIR -- but is complicated by control flow.
(2) Is a bit more murky to me, as it would require some concept of which gates may be run in parallel. Would this be the number of nodes on the tree?

I imagine the API could look as follows:


@cudaq.kernel
def kernel():
    qubit = cudaq.qubit()
    x(qubit)
    y(qubit)
    z(qubit)

gate_counts = kernel.counts()
print(gate_counts) # { "x": 1, "y": 1, "z": 1 }

depth = kernel.depth()
print(depth) # 1

I would appreciate -- particularly from compiler folks -- about the feasibility and best ideas for each implementation.

One thing to note on the C++ side is that we do support getting that first part at runtime with the Tracer (I don't think this is exposed to python though). Of course this is a runtime-only thing. It would be best if we exposed this type of thing as an MLIR analysis pass (although you'll need runtime arg information there / e.g. post quake-synth).

auto resources = cudaq::estimate_resources(kernel, args...);
// resources has gate types and number of occurrences.