QuEST-Kit/QuEST

Segmentation fault crash at larger N using diagonal operators in CUDA

Closed this issue · 3 comments

I use the diagonal operator routines a lot in my research, and I've noticed that if I try to define or apply them for N=19 qubits or more qubits in CUDA on "older" GPUs (either a P6000 or GP100 in workstations in my group), it crashes and reports a segmentation fault in the command line. The code works normally when N=18 or below. However, when I run identical code on a newer GPU (a V100 in the cluster at my university) I don't see the same crash and can run up to N=22 (largest I've tested so far). Do you have any idea why this crash is occurring? I use the older GPUs for tinkering and prototyping (and our cluster has had issues lately) so it would be handy to be able to use them for larger N simulations if possible. This is in a fresh install of CUDA on Ubuntu.

Thanks,
Eliot

Hi Eliot,

I'm unable to replicate this on our own P6000, running up to 29 qubits. Note that you'll see many segmentation faults if you supply the wrong GPU_COMPUTE_CAPABILITY for your GPU at compile time. As per here:

  • P6000 requires GPU_COMPUTE_CAPABILITY=61
  • V100 requires GPU_COMPUTE_CAPABILITY=70
  • GP100 requires GPU_COMPUTE_CAPABILITY=60

Remember, you must recompile when changing the target GPU, including QuEST's GPU backend (so you should make clean first).

I tested with this code:

#include <QuEST.h>
#include <stdio.h>

int main() {

    int n = 25;

    QuESTEnv env = createQuESTEnv();
    Qureg qureg = createQureg(n, env);
    printf("state created\n");

    initPlusState(qureg);
    qreal amp = 1./sqrt(pow(2, n));
    printf("state amp: %g, expected amp: %g\n", getRealAmp(qureg, 0), amp);

    DiagonalOp op = createDiagonalOp(n, env);
    printf("diag created\n");
    printf("expected #elems: %lld, actual #elems: %lld\n",
        1LL << n, op.numElemsPerChunk);

    long long int i = (1LL << n) - 1;
    op.real[i] = 10;
    op.imag[i] = 20;
    printf("diag modified\n");

    syncDiagonalOp(op);
    printf("diag synched\n");

    applyDiagonalOp(qureg, op);
    printf("diag applied\n");

    qreal expecRe = 10*amp;
    qreal expecIm = 20*amp;

    printf("difRe: %g, difIm: %g\n",
        getRealAmp(qureg, i) - expecRe,
        getImagAmp(qureg, i) - expecIm);

    destroyQureg(qureg, env);
    destroyDiagonalOp(op, env);
    destroyQuESTEnv(env);
    return 0;
}

and received the correct output of:

state created
state amp: 0.000172633, expected amp: 0.000172633
diag created
expected #elems: 33554432, actual #elems: 33554432
diag modified
diag synched
diag applied
difRe: 0, difIm: 0

Hi Tyson! Thanks for this -- it executed correctly on my machine and I was then able to trace the crash to something that had nothing to do with QuEST. It turned out to be C++ array memory allocation, I never use arrays because vectors are so much easier, but I needed to for QuEST because I couldn't set diagonal operator elements from a vector (whereas I could from an identical 1d array). For whatever reason the compiler on the cluster computers with V100s defused that issue but the one on my local machine didn't, so that's why I misidentified it as a GPU problem.

Sorry for wasting your time! I really appreciate your efforts and QuEST has been a real game changer for some of my research projects.

No problem at all, these things happen!
It's valuable to know that some C++ specific interfaces to the QuEST API would be useful, I'll add these to my backlog.
Great to hear you've found QuEST useful! :)