C++ header "functional" not supported?

Question

C++ header "functional" not supported?

leofang opened this issue 4 years ago · 8 comments

I am testing jitify with a simple Thrust code, in which I just include <thrust/count.h>. In the end, it looks up at my /usr/local/cuda-10.2/include/thrust/functional.h, in which <functional> is included:
https://github.com/NVIDIA/thrust/blob/deac895a041fc5fc6443b49f6846f6bbdcb60756/thrust/functional.h#L25
But it looks like <functional> is not part of jitify's get_jitsafe_headers_map? The error I got is

/usr/local/cuda-10.2/include/thrust/functional.h not in loaded sources! This may be due to a header being loaded by NVRTC without Jitify's knowledge.

Is there any workaround that I can do as a user? Thanks.

Answer 1 · 2020-11-05T10:07:44.000Z

Thanks for the report. Can you provide code to reproduce the error? Also just to check, are you using the latest version of jitify?

Answer 2 · 2020-11-05T16:28:55.000Z

Hi @benbarsdell, thanks for reply. I am adding Jitify support to CuPy: cupy/cupy#4228, so my test codes have a mix of Python and C++ codes, and I don't have a clean, self-contained C++ code for the moment. I'll try to create one later for you, but I think any working minimal code that you have at hand would reproduce the same error if you add one extra line to include a Thrust header like <thrust/count.h> or <thrust/reduce.h> (the two I tested).

And yes, I add Jitify as a submodule to CuPy, and I pull from the master.

Answer 3 · 2020-11-06T21:58:43.000Z

Hi @benbarsdell Below is the promised C++-only code, almost a condensed version of jitify_example.cpp:

// compile this code with: nvcc -arch=sm_75 --std=c++11 -lcuda -lcudart -lnvrtc test_jitify.cu -o test_jitify
#include "jitify.hpp"

#define CHECK_CUDA(call)                                                  \
  do {                                                                    \
    if (call != CUDA_SUCCESS) {                                           \
      const char* str;                                                    \
      cuGetErrorName(call, &str);                                         \
      std::cout << "(CUDA) returned " << str;                             \
      std::cout << " (" << __FILE__ << ":" << __LINE__ << ":" << __func__ \
                << "())" << std::endl;                                    \
      return false;                                                       \
    }                                                                     \
  } while (0)

template <typename T, typename U>
bool are_close(T in, U out) {
  return fabs(in - out) <= 1e-5f * fabs(in);
}

template <typename T>
bool test_simple_experimental() {
  const char* program_source =
      "my_program\n"
      "#include <thrust/reduce.h>\n"  // this line will make Jitify fail
      "template<int N, typename T>\n"
      "__global__\n"
      "void my_kernel(T* data) {\n"
      "    T data0 = data[0];\n"
      "    for( int i=0; i<N-1; ++i ) {\n"
      "        data[0] *= data0;\n"
      "    }\n"
      "}\n";
  std::vector<std::string> opts;
  jitify::experimental::Program program_orig(program_source, {}, opts);
  auto program =
      jitify::experimental::Program::deserialize(program_orig.serialize());
  T h_data = 5;
  T* d_data;
  cudaMalloc((void**)&d_data, sizeof(T));
  cudaMemcpy(d_data, &h_data, sizeof(T), cudaMemcpyHostToDevice);
  dim3 grid(1);
  dim3 block(1);
  using jitify::reflection::type_of;
  auto kernel_inst_orig =
      program.kernel("my_kernel").instantiate(3, type_of(*d_data));
  auto kernel_inst = jitify::experimental::KernelInstantiation::deserialize(
      kernel_inst_orig.serialize());
  CHECK_CUDA(kernel_inst.configure(grid, block).launch(d_data));
  cudaMemcpy(&h_data, d_data, sizeof(T), cudaMemcpyDeviceToHost);
  cudaFree(d_data);
  return are_close(h_data, 125.f);
}

int main() {
  std::cout << test_simple_experimental<int>() << std::endl;
  return 0;
}

Without the extra line that includes a Thrust header (which really does nothing), the code will build and run correctly (output: 1). With the extra include, the code builds, but fails at runtime:

$ ./test_jitify
  assert.h
  iterator
  jitify_preinclude.h
  limits.h
  math.h
  memory.h
  my_program
  stdint.h
  stdio.h
  stdlib.h
  string.h
  time.h
  utility
terminate called after throwing an instance of 'std::out_of_range'
  what():  /usr/include/thrust/functional.h not in loaded sources! This may be due to a header being loaded by NVRTC without Jitify's knowledge.
Aborted (core dumped)

which is exactly what I reported earlier.

Answer 4 · 2020-11-06T21:58:57.000Z

cc: @maddyscientist

Answer 5 · 2020-11-10T00:03:12.000Z

Unfortunately I can't reproduce the exact error you're getting.

Without specifying an include path, I get:

my_program(1): warning: thrust/reduce.h: [jitify] File not found

After adding the runtime compilation option -I/usr/local/cuda/include, I get:

thrust/functional.h(25): warning: functional: [jitify] File not found

and then many other NVRTC compilation errors.

Can you confirm that you're at commit 3e96bcc?

Perhaps the bigger issue though is that Thrust headers in general are not supported. (The only one we test is counting_iterator.h). Thrust itself does not yet fully support NVRTC and there are too many issues for us to workaround them all in Jitify. I'll note that things are a bit better with CUB (at least the block-level API).

Answer 6 · 2020-11-12T05:03:31.000Z

Thanks for checking, @benbarsdell.

Unfortunately I can't reproduce the exact error you're getting.

Yeah this is really weird. Why functional caused a warning for you but an error for me?

Can you confirm that you're at commit 3e96bcc?

I am 100% certain as I just cloned Jitify from master. I noticed there's a jitify2 branch -- any chance you were on that branch? 🙂 (btw is there an expected timeline to see that become the default branch?)

Perhaps the bigger issue though is that Thrust headers in general are not supported. (The only one we test is counting_iterator.h). Thrust itself does not yet fully support NVRTC and there are too many issues for us to workaround them all in Jitify. I'll note that things are a bit better with CUB (at least the block-level API).

Thanks for the info, Ben. I was not aware of this, and thought #39 has brought a wide range of Thrust support to Jitify.

The original motivation for me to try Thrust+Jitify was due to a user report to CuPy, in which Thrust's count.h was used: cupy/cupy#3728 (comment). But I can't make that code work if I switch the compiler from nvcc to Jitify + nvrtc (got the error I reported above).

Yes, for CUB block API Jitify seems to be working fine! Many thanks for the nice work 🙏

Answer 7 · 2020-11-13T15:11:23.000Z

From @benbarsdell in #82 (comment):

because your Thrust library is installed in /usr/include/thrust. There is a known issue in NVRTC < 11.0 where /usr/include is automatically added as an include path, and this circumvents Jitify's header loading functionality.
This should be fixed if you use CUDA >= 11.0 or if you move Thrust to somewhere else (or simply remove that directory and use the copy that comes in the CUDA Toolkit in /usr/local/cuda/include/thrust).

I will try to verify this later today in a different environment with CUDA headers installed in the standard path instead of /usr/include/.

Answer 8 · 2021-05-19T04:00:03.000Z

Update: I verified on CUDA 11.2 the original core dump error (This may be due to a header being loaded by NVRTC without Jitify's knowledge) is gone, and a lot of compilation errors are thrown (which is kinda expected for Thrust). Thank you for helping, Ben! I am closing this issue.