NVIDIA/jitify

Is --expt-extended-lambda and --expt-relaxed-constexpr not supported?

anshumang opened this issue ยท 5 comments

From the nvrtc docs, it seems that the above options are not supported. Can you please confirm?
@maddyscientist

@anshumang these restrictions stem from the fact that they rely on host-device compiler interactions, and since nvrtc is a pure GPU compiler, I guess it shouldn't be surprising that they are not supported.

  • The lack of support for --expt-relaxed-constexpr is not any way a restriction, since nvrtc is capable of interpreting all functions presented to it as a __device__ function. This option is enabled by jitify, so things will just work as is.
  • For --expt-extended-lambda, in the general case the lambda will be defined in offline code, and not visible to the run-time compiler, so it would not be possible for nvrtc to support it. You would need to define the lambda in code visible to the runtime compiler in order for nvrtc to compile this. I suppose it would be possible to compile the offline lambda to a ptx function and then so long as the ptx was available at runtime, then nvrtc could inline this at runtime in the kernel. Likely a multitude of headaches associated with such an approach that would need to be solved, but I think it's at least possible.

@benbarsdell agree with the above?

Thanks for the detailed answer @maddyscientist
That helps.
For moving a kernel from offline to runtime compilation in a code base that has --expt-relaxed-constexpr and --expt-extended-lambda among the nvcc options, I am considering invoking nvcc at runtime instead of going the nvrtc/jitify path. This is to keep code changes (to account for nvcc and nvrtc differences) to a minimum. The host code is performance sensitive and I do not yet know if the invocation of nvrtc v/s nvcc would have an impact. Also, there would be more boilerplate in the host code when using nvcc which is less of a concern. If you have some thoughts on problems with the nvcc approach, please do share. Thanks !

As I said, the ---expt-relaxed-constexpr issue isn't a problem at all. Things will just work without this flag.

For lambda issue though, could you show me an example of what the code looks like that uses extended lambdas? Jitify does have lambda-like functionality using the JITIFY_LAMBDA macro, I wonder if this could satisfy your need.

The problem with using nvcc at runtime is that you have to ensure that wherever the code runs, you have the full CUDA toolkit and host compiler available. So, for example, if you know that your development system and deployment system are the same, this isn't an issue.

Let me see if I can share a representative snippet.
Yes, toolkit installation on the deployment host, is something I missed. That is possible.
Good to know that otherwise, invoking nvcc v/s using nvrtc would be the same.

Great, will be good to see a snippet. In general, I would say that it would be good to understand issues where nvrtc / Jitify isn't a good match. E.g., if you need to use shell out to nvcc at runtime, perhaps that represents a weakness in the nvrtc model that needs to be addressed. Thx.