yardstiq/quantum-benchmarks

potential incorrect benchmark for qiskit-gpu

Closed this issue · 4 comments

Hi @chriseclectic

Thanks for your previous contribution to get the CPU benchmark correct. However, I'd like to check the following result, we recently went through the entire benchmark again since the benchmark result is very strange: the timing merely scales with the number of qubits at all. I suspect when running the benchmark of qiskit-gpu, the cudaDeviceSynchronize is not called. even for 30 qubits, the timing is only 5ms.

on the other side, I didn't find any function call to cudaDeviceSynchronize in the source code either: https://github.com/Qiskit/qiskit-aer/blob/master/src/simulators/statevector/qubitvector_thrust.hpp
I feel it is unlikely that thrust does the sync implicitly, but I could be wrong.

Moreover, the timing is 100x difference from what qulacs and Yao has, since qulacs and Yao are implemented independently, and their benchmark results match each other, thus I believe this problem could exist. But I'd like to get some help from you to confirm this.

FYI: even sum over a vector of size 2^30 in complex<float64> requires 24ms.

Thanks a lot.
Roger

@chriseclectic Update: we tried to check the timing through nvprof, but nvprof cannot give any timing information about qiskit-gpu, moreover, we cannot observe any GPU usage in nvdia-smi.

We have to submit our final version of the paper to the editor this weekend. Please help us check this as soon as possible. Thanks again.

Copying summary of our Slack conversion here.

It looks like the tests aren't running for the GPU simulator due to incompatible CUDA version. Currently we require CUDA 10.2 because of our requirement of a C++14 compiler. @atilag is currently working on modifying out build system so we can get it working with CUDA 10.1. We will push an update to our master branch as soon as we can.

After more investigating I found that building form source should work with CUDA 10.1, however we just changed the build system we use and it looks like the new system has some bugs with NVCC compiler. I was able to build the stable branch from source with CUDA 10.1 and GCC 8 though.

We are aiming to get bug fix release out in the next few days that will let the PyPi package work with 10.1 as well.

currently fixed via a native build! thanks everyone!

for whoever interested in reproducing, it requires you to rebuild qiskit-aer using the following command from @chriseclectic

git checkout stable/0.5
python setup.py bdist_wheel -- -DAER_THRUST_BACKEND=CUDA -DCMAKE_CXX_COMPILER=g++-8
pip install dist/*.whl