NVIDIA/jitify

Questions: what can be done asynchronously?

DavidPoliakoff opened this issue · 3 comments

Hey folks,

This bug is a question, feel free to close on an answer, no code needed.

I'm trying to use Jitify asynchronously, I have a pipeline which asynchronously creates program templates to hand off to Jitify (or Cling), launches a generic version of things to be jitted while waiting, and then swaps in the NVRTC products when they're available. Jitify is printing the PTX it asynchronously generates (yay) and then erroring
CUDA_ERROR_INVALID_CONTEXT (aww).

I'm assuming this is because I'm creating a KernelLauncher on a different thread than the one in which I wish to execute it, std::async will launch tasks in another thread, if Jitify is picking up the default CUDA context for that thread I don't know what happens when the KernelLauncher gets returned to a different thread with a different CUDA context. My questions are:

  1. Is there a way to pass a cudaContext to the KernelInstantiation to create the KernelLauncher under that context?
  2. If not, how far down the Jitify stack can I go in another thread before cudaContexts become relevant? My intuition is that I can instantiate a kernel, I just can't configure it, but let me know if I'm wrong there.

Thanks again for your help!

Answered 2 for myself: if I can't pass a context around, I actually need to form the program itself, everything after kernel_cache.program(recipe,0) appears bound to a given CUDA context. This is workable, but not preferable

Thanks for the report, I can reproduce the error you're seeing.

A solution is to call a CUDA Runtime API function such as cudaSetDevice(gpu_index) (or even cudaFree(0)) in the new thread before calling Jitify functions (which internally call the CUDA Driver API). The first call to a Runtime function will automatically set the context in the thread. Let me know if that works for your application.

@benbarsdell , that's really good work, solved the problem. Thanks!