NVIDIA/jitify

Support for curand

mondus opened this issue · 12 comments

I have been making lots of good progress with jitify. Thanks for the excellent tool. One issue which I am currently unable to resolve however is the use of curand.

If I include curand in my jitify kernel (e.g. #include <curand_kernel.h>) and correctly set the compiler to add the cuda include directory (from CUDA_PATH) then there are a whole bunch of errors from cuda.h relating to ambiguous definitions of size_t

e.g.

cuda.h(1773): error: "size_t" is ambiguous

Thanks for the feedback!

I tried to reproduce the problem but ran into a different issue related to missing headers (which I'll submit a patch for).

Could you tell me what CUDA version you're using and provide a minimal reproducer?
*EDIT: Could you also provide what platform you're running on.

Hi @benbarsdell. Thanks for getting back to me. I saw your talk with Kate at GTC when you first introduced jitify and knew I would need it at some point!

I am using CUDA 10.1 on windows using dlfcn-win32 (see #42). A minimal example is to add the line

"#include <curand_kernel.h>\n"

at line 159

Note: This actually works on linux no problem. To reproduce on windows requires building with dlfcn-win32, setting a pre-porcessor macro of NOMINMAX , updating the hard coded include directory and pre-processing (with stringify) the my_header3.cuh file to generate to stringified version. I also had to create a disk version of my_header4.cuh as the callback function didn't seem to work.

I have noticed that one of the later tests also fail on windows. E.g. line 266 fails. Looking at the get_constant_ptr and demangle functions this is because it is expecting the name mangling to be different in visual studio. My windows ptx has the same mangling as in linux however. E.g.

.const .align 4 .u32 a;
.const .align 4 .u32 _ZN1b1aE;
.const .align 4 .u32 _ZN1c1b1aE;

Hence the lookup fails.

That's interesting that you are not seeing the expected name mangling on Windows. Which compiler are you using? I guess we need to fix up the demangle detection for this.

Hi @maddyscientist it is Visual Studio 2019 with CUDA 10.1. I noticed that the demangle detection is easy to fix by changing a macro guard but the actual demangling would require cxxabi which is not so easy to come by on windows...

I was able to reproduce your issues in Visual Studio 2019 with CUDA 10.2, and I have fixes for them. I'll put them into a PR probably tomorrow. I'll also add workarounds for the NOMINMAX and dlfcn problems to avoid those annoyances.

The size_t issue is because jitify provides a definition of it in a built-in header, but NVRTC already provides its own built-in definition, and on Windows these definitions conflict. The fix is to remove these lines:

jitify/jitify.hpp

Lines 1578 to 1579 in 8af928e

"typedef unsigned long size_t;\n"
"typedef signed long ptrdiff_t;\n"

The name mangling issue is because CUDA (PTX) always uses the Itanium mangling scheme even when compiling with Visual Studio. I was able to implement a simple demangler for variable names that avoids needing cxxabi.

With these fixes, all the tests in jitify_example.cpp pass. I didn't see anything related to the callbacks; maybe it was some kind of current-working-directory problem?

@benbarsdell Fantastic. Looking forward to looking at your Itanium demangler code! I will take another look at the callbacks issue. Probably user error.

The fixes are available in #45. Let me know if anything doesn't work for you.

@mondus @Robadob
I added the #include <cctype> fix to #45. Are you OK with it being merged?

Building with the current status of #45 and VS2015, it works out the box for me. These are the remaining compile warnings I get (I think they're significantly reduce from what Mondus showed me the other week, though I haven't tried it in VS2020).

I haven't tested with Curand, I don't think Mondus shared that code with me. I've sent him an email and will try and chase up if he's in the office today.

Might be worth addressing them. Though, if it blocks our CI I can probably just wrap the include to reduce warning level.

C:\Users\Robadob\Documents\Visual Studio 2015\Projects\Jitify>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -x cu -rdc=true -I"C:\Program Files (x86)\dlfcn-win32\include" -I"C:\Program Files (x86)\Visual Leak Detector\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc140.pdb /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "C:\Users\Robadob\Documents\Visual Studio 2015\Projects\Jitify\kernel.cu"
1>c:\program files\nvidia gpu computing toolkit\cuda\v10.1\include\vector_types.h(423): warning : calling a __host__ function("std::_Iterator_base12::_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::_Iterator_base12 [subobject]") is not allowed
1>c:\program files\nvidia gpu computing toolkit\cuda\v10.1\include\vector_types.h(423): warning : calling a __host__ function("std::_Iterator_base12::~_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::~_Iterator_base12 [subobject]") is not allowed
1>  kernel.cu
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2259): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2087): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2793): warning C4267: 'argument': conversion from 'size_t' to 'unsigned int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(3564): warning C4267: 'argument': conversion from 'size_t' to 'unsigned int', possible loss of data
1>c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(217): warning C4800: 'unsigned __int64': forcing value to bool 'true' or 'false' (performance warning)
1>  c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(217): note: while compiling class template member function 'bool jitify::ObjectCache<unsigned __int64,jitify::ProgramConfig>::contains(const unsigned __int64 &) const'
1>  c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2894): note: see reference to function template instantiation 'bool jitify::ObjectCache<unsigned __int64,jitify::ProgramConfig>::contains(const unsigned __int64 &) const' being compiled
1>  c:\users\robadob\documents\visual studio 2015\projects\jitify\jitify/jitify.hpp(2357): note: see reference to class template instantiation 'jitify::ObjectCache<unsigned __int64,jitify::ProgramConfig>' being compiled

@benbarsdell Sorry for the delay. Other academic duties have moved me away form coding this week. I am sure your fix is fine but I will test this early next week and confirm.

No worries, thanks for the feedback so far. I pushed some more fixes for conversion warnings. I'll merge #45 now (I have a follow-up PR to submit) and we can address any other issues that come up next week.

This works for me. Cheers.