libcu++ should not define __host__/__device__/__forceinline__ (or any other runtime macros)
jrhemstad opened this issue · 2 comments
In our __config
we have the following:
libcudacxx/include/cuda/std/detail/__config
Lines 14 to 39 in b8b37d6
this will define things like __host__
to be empty when we detect that we're not compiling with nvcc in order to silence any errors about unrecognized symbols that would occur from host compilers that don't understand what __host__
is.
That is a reasonable thing in many cases, but there are legitimate use cases where this can break. For example, the following file compiled with a host compiler will fail to compile:
#include <cuda/std/type_traits>
#include <cuda_runtime_api.h>
This will fail due to a redefinition error of __host__
and friends because cuda/std/type_traits
will detect we're not compiling with nvcc and define __host__
to empty, but then cuda_runtime_api.h
will also define __host__
, causing the redefinition error.
One could reasonably argue that the runtime headers should better guard against this situation by undefining __host__
before defining it. However, it is most expedient for us to fix this in libcu++.
The only robust solution I can think of is to avoid ever defining __host__
/__device__
/__forceinline__
ourselves and instead use our own macros like:
#if defined(__CUDACC__)
#define _LIBCUDACXX_HOST __host__
#else
#define _LIBCUDACXX_HOST
#endif
And use those in place of any where we'd use __host__
(or other macros).
I'd expect we should rarely be using host/device annotations directly anyways. Those should come from __LIBCUDACXX_INLINE_VISIBILITY
or just _LIBCUDACXX_EXECUTION_SPACE_SPECIFIER
.
It looks like the number of places we use __host__
or __device__
directly are pretty small, and they are almost always together.