[Benchmarking] Kernel resource estimation to calculate gate counts and depth
anthony-santana opened this issue · 1 comments
Required prerequisites
- Search the issue tracker to check if your feature has already been mentioned or rejected in other issues.
Describe the feature
We have a need for benchmarking the actual synthesized circuit that's being spit out by our compiler. Metrics of interest are:
- The number of each gate (could have something like a
GateCounts
dictionary type to store this) - The depth of the circuit
(1) Seems straightforward -- just counting the number of each op in the MLIR -- but is complicated by control flow.
(2) Is a bit more murky to me, as it would require some concept of which gates may be run in parallel. Would this be the number of nodes on the tree?
I imagine the API could look as follows:
@cudaq.kernel
def kernel():
qubit = cudaq.qubit()
x(qubit)
y(qubit)
z(qubit)
gate_counts = kernel.counts()
print(gate_counts) # { "x": 1, "y": 1, "z": 1 }
depth = kernel.depth()
print(depth) # 1
I would appreciate -- particularly from compiler folks -- about the feasibility and best ideas for each implementation.
One thing to note on the C++ side is that we do support getting that first part at runtime with the Tracer (I don't think this is exposed to python though). Of course this is a runtime-only thing. It would be best if we exposed this type of thing as an MLIR analysis pass (although you'll need runtime arg information there / e.g. post quake-synth).
auto resources = cudaq::estimate_resources(kernel, args...);
// resources has gate types and number of occurrences.