yardstiq/quantum-benchmarks

PyQuEST-cffi mislabeled as QuEST

Closed this issue · 5 comments

Hi there,

It's excellent to see initiative benchmarking the wide suite of available QC emulators!

However, it appears PyQuEST-cffi is mislabeled as "QuEST" in the plot legends.
PyQuEST-cffi is an independent project by HQS to write python bindings for the C project QuEST, on which I myself work. These python bindings carry overhead to the underlying QuEST C functions, and hence their performance can be (especially with large iteration in python) significantly worse than QuEST's, which is not here benchmarked. Is it possible to correct these legends?

Note I believe, like Yao, PyQuEST-cffi supports GPU in addition to CPU (since QuEST supports multithreading, GPU and distribution).

Thanks very much,
Tyson

@TysonRayJones Thanks for this very helpful comments.

However, it appears PyQuEST-cffi is mislabeled as "QuEST" in the plot legends.

thanks, fixed by 801e4e9

PyQuEST-cffi is an independent project by HQS to write python bindings for the C project QuEST, on which I myself work.

I see, I didn't notice the relationship between these two.

These python bindings carry overhead to the underlying QuEST C functions, and hence their performance can be (especially with large iteration in python) significantly worse than QuEST's, which is not here benchmarked.

I don't think it will significantly worse, since there is no large for loop in our Python benchmark script, and for single gate benchmark, it should has similar timings with pure C since there is only a small overhead of the cffi (note I didn't use any wrapper code, but just direct FFI calls). But it won't hurt to add pure C benchmarks for sure.

Note I believe, like Yao, PyQuEST-cffi supports GPU in addition to CPU (since QuEST supports multithreading, GPU and distribution).

IIUC, QuEST's CUDA backend can only be enabled at compile time? I didn't find any API to enable this (where in Yao, or things like QCGPU it can be enabled in runtime, which make it possible to be called by other programs), thus I don't think it is possible to enable it from Python side, but it should be straightforward once we have a pure C benchmark.

thanks, fixed by 801e4e9

Thanks very much for this! 'quest-cffi' is better, though still distinct from its actual 'pyquest-cffi' name :)

I don't think it will significantly worse, since there is no large for loop in our Python benchmark script, and for single gate benchmark, it should has similar timings with pure C since there is only a small overhead of the cffi (note I didn't use any wrapper code, but just direct FFI calls). But it won't hurt to add pure C benchmarks for sure.

Yea possibly - I couldn't locate the benchmark script myself to check.

IIUC, QuEST's CUDA backend can only be enabled at compile time? I didn't find any API to enable this (where in Yao, or things like QCGPU it can be enabled in runtime, which make it possible to be called by other programs), thus I don't think it is possible to enable it from Python side, but it should be straightforward once we have a pure C benchmark.

Yep that's correct. Enabling GPU during runtime is very interesting! Can you elaborate though on what that enables?

Thanks very much for this! 'quest-cffi' is better, though still distinct from its actual 'pyquest-cffi' name :)

Oh, sorry, I didn't intend to use that.

Yea possibly - I couldn't locate the benchmark script myself to check.

I've emailed them. and also update a contributing guide.

Yep that's correct. Enabling GPU during runtime is very interesting! Can you elaborate though on what that enables?

It is just a runtime type to let your interpreter/compile know which memory you are using. It is not possible to implement this feature in pure C without
implementing your own type system or using different APIs (for cuda) since C does not support generic type. It is something trivial in runtime typed languages like Python and Julia. However, it is possible to support it in C++ by defining streams with templates, and user won't need to switch between APIs.

One way to do this is via templates and specialize it while porting to a runtime frontend (say python), another way is to define a runtime table, former approach is used in mshadow: https://github.com/apache/incubator-mxnet/tree/master/3rdparty/mshadow later is the PyTorch-aten: https://github.com/pytorch/pytorch/tree/master/aten/src/ATen

There is not much docs for these, but I think the code base should be straight forward to read.

But a possible approach to implement this with pure C might be implementing your own runtime dispatcher, which dispatch the functions based on its context, you can find an example of this technique in my old fork of TH (it is no longer used by torch I think): https://github.com/Roger-luo/TH/blob/master/generic/THVectorDispatch.c

I think this issue is resolved now (the label is pyquest-cffi now. Thanks!

Please feel free to open PR about the pure C benchmark and comment in new issues.