yardstiq/quantum-benchmarks

Qrack/PyQrack?

Opened this issue · 11 comments

Sorry, I can't find the original issue that I thought raised the question, but are Qrack or PyQrack ever going to be added to these benchmarks?

Speaking as one of the authors, I'm sorry to press the issue, but it's unequivocal that Qrack performance is constant on single X, H, T, and CNOT gates at any arbitrary qubit width, due primarily to optimization via proactive and reactive Schmidt decomposition, with very loose inspiration taken from Pareto-Efficient Quantum Circuit Simulation Using Tensor Contraction Deferral. Redundantly, Qrack's extended stabilizer simulation capabilities cover these cases at well, but I think the comparison would be fair, because these and other optimizations in the default optimal Qrack "layer stack" are exact and generally universal, without limitation to Clifford group, for example. (Also, this is a case of performance on these benchmarks, not something as unlikely as general constant or linear performance, obviously, which is why Qrack has historically presented representatively general harder cases in our own benchmarks.)

I'm saying, I understand that stabilizer would break these trends without a universal gate set, but this is not the case for these benchmarks run in Qrack, which can perform this way while admitting an exactly simulated universal gate set in the same case. Is the motivation for leaving Qrack benchmarks out based on the assumption that its default optimal settings wouldn't be universal, like with Clifford? (To be clear, they are, though!)

By the way, I just realized that the original mention of this was an open PR, rather than an issue. Sorry about that. I don't mean to press too hard, but I guess I do mean to press the issue. I understand that Qrack breaks the trend in common with all other candidates, but this is while accommodating a universal gate set in the same use case, so I don't know why Qrack doesn't qualify for consideration. I'm here to furnish anything you need, including existing and new validation. (It's something that can wait until 2022, though. Happy holidays, by the way!)

No, it's not about the universal gate set or anything else. It's about comparing apples to apples. I will hope to compare stabilizer simulation with stabilizer simulation (given there are other stabilizer simulators), circuit optimization (with different targets) to circuit optimization.

The previous benchmark should not include any form of circuit optimization/compilation, since the goal of that benchmark is to show the quality of single simulation instruction. We didn't enable any optimization/compilation pipeline for the existing benchmarks (this is also why we need to revisit cirq recently).

The proper way of benchmarking optimization/compilation is to enable it for every simulator not just qrack. The reason why #24 was not merged is that theoretically, this type of instruction has to show exponential growth in time, which doesn't make sense for qrack benchmark right now.

I think currently we are not sure if there is a mode to compare the full-state gate instruction otherwise we have to pend the PR until we setup the stabilizer benchmark

And sorry I just realize you commented on that PR in July and somehow we didn't notice. And I think the best way of adding qrack is still having you reviewing and maybe implementing some benchmarks since you know better details about it than others. then we can also just list qrack in the benchmark with footnote about the actual simulation methods.

@Roger-luo Thank you for the helpful response, and I'm sorry that I didn't ask your reasoning, first. This does make sense to me. Our Schmidt decomposition technique is really not a compilation technique, but I still see how this would be like comparing MPS to ket simulation, for example, and I see why this isn't the point.

It's not the preferred default simulation method, but we can get Qrack or PyQrack to carry out just a ket-based simulation, very easily. I'll explain on the PR, and it's probably about the same performance to use PyQrack for the task, if Python makes it easier.

@WrathfulSpatula If you want to compare different algorithms rather than different implementations, you can open a new repository for that. I am quite interested to see the performance of different algorithms. What I can help is the tensor network backend benchmark. FYI: Yao has a tensor network backend here: https://github.com/QuantumBFS/YaoToEinsum.jl

@GiggleLiu I'd love to help make that comparison, and start the repository! I'd have to think about what tests are representative of cross comparison between different methods, though, and I'd love your input, because it's been a perennial problem for Qrack, that many test cases are trivial for ours and other methods, but these cases don't necessarily realistically represent general performance.

Qrack's go-to has been random 3-parameter unitary single qubit gate initialization across the full width of a simulator, before a quantum Fourier transform. For example, |0> or any permutation basis eigenstate initialization before a QFT is a solved problem, for Qrack, but I can't think of a reason why that set of initialization conditions are meaningful representations of general case performance, for anything. With that random initialization, though, it's a "hard problem," for Qrack. Can you suggest any other cases you'd like to see?

I think QFT is a good one, another one can be google sycamore circuit: See brenjohn/Contraction-Order-Bench#2

Just for clarification I'm not saying I want to reject different algorithms but I'm saying for the specific PR SK wrote it was meant to compare exact simulation instructions. We didn't know much about the qrack algorithm so we cannot compare the Schmidt decomposition one with our best knowledge.

And currently I think qrack actually supports exact simulation? The point is it would be nice to have this as a baseline and then include the optimized benchmark then show it in the markdown file what the method is and what's the difference (similar to ddqsim package).

This project is expected to include more benchmarks since it is turned into an open source collaboration in unitary fund (based on our previous benchmarks from Yao) so we have some basic infrastructure to host benchmarks but I think SK is probably not active anymore recently. So there's not much progress since summer. But we do have already included different algorithms and compilation benchmarks in the repo. So I'm open to new benchmarks in general it's just we need to make sure things are apples to apples.

@Roger-luo That all makes perfect sense to me. (Honestly, slightly TMI, I felt bad after the initial issue comment of mine, I think in part because I spent my entire PTO vacation in COVID exposure quarantine, and I think I was cranky. You know, like basically everybody else also did, it seems like, right now.)

I'm doing this as independent lead developer of Qrack, rather than explicitly as a member of the Unitary Fund, but I'd love to finish up the PR in the next couple of days, also with other methods, just as appropriate. Apples-to-apples is perfectly valid, in that "ket" or "Schrödinger method" performance can vary very significantly depending on implementation details. This underlies Qrack, as well, so, if we're hypothetically slower at just ket than anyone else, I'd like to know and to learn from the comparison!

Yes, Qrack's method is exact and universal, by the way! By default, basically everything one does with Qrack is exact up to inherent floating point truncation, though we do estimate a systematic error floor for our FP truncation and try to self-consistently recover some performance by flooring amplitudes to 0 below 2^-33 norm for float precision for example, only if we're normalizing for some reason, (and even this can be turned off, while normalizing).

The default optimal method of Qrack is also exact, and it's very similar to MPS, from my limited knowledge of tensor network methods, except that it's "ket" subsystems rather than rank-2 density matrices, and it reactively (Kronecker product) "composes" the subsystems exactly when user code reaches for any entangling gate, rather than setting an approximation tolerance for principal components to preserve. Now, I'm experimenting with approximation methods that are similarly MPS-like, with a tolerance, but still not quite.

We're definitely an oddball method, and we've long tried to keep our benchmarks fair by presenting only representatively hard circuit cases, where we still exponentially scale for exact simulation. But, thank you again for including us, and I'd love to put the right context and caveats around Qrack!

I think we can close this issue, now that the benchmarks for Qrack have been added. Thank you!

let's leave this open for now, since I haven't updated the benchmark results on website