pybind/pybind11

[BUG]: Cuda 12.1: error: expected template-name before ‘<’ token

dcbishop opened this issue · 7 comments

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.10.4

Problem description

/usr/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type
>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/usr/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/usr/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
/usr/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/usr/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                              ^

Reproducible example code

echo "#include <pybind11/functional.h>" >> t.cu
nvcc -isystem=/usr/include/python3.10 -c t.cu

Is this a regression? Put the last known working version here if it is.

Not a regression

I fixed this by editing /usr/include/pybind11/cast.h:

-    return caster.operator typename make_caster<T>::template cast_op_type<T>();
+    return caster;

It seems nvcc 12.x cannot parse correctly... not sure if this is a nvcc bug, maybe we should just use the unfancy syntax to do the cast, either implicit cast or c-style cast, rather than .operator T...

I also encounter this issue when I build https://github.com/NVIDIA/apex on ArchLinux with cuda 12.1.0, gcc 12.2.1, pybind11 2.10.4.

/opt/cuda/bin/nvcc -I/usr/lib/python3.10/site-packages/torch/include -I/usr/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.10/site-packages/torch/include/TH -I/usr/lib/python3.10/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.10 -c csrc/mlp_cuda.cu -o build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=mlp_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -std=c++17
/usr/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/usr/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/usr/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
/usr/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/usr/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                              ^
error: command '/opt/cuda/bin/nvcc' failed with exit code 1

But I could build it with gcc 10.4.0.

I was able to confirm via an NVIDIA contact that this is indeed an nvcc bug and they've opened an internal issue for it.

The takes_ref_wrap and takes_const_ref_wrap tests in test_builtin_casters.cpp fails to compile with the workaround applied:

include/pybind11/detail/../cast.h:47:12: error: invalid initialization of reference of type ‘pybind11::detail::type_caster<ConstRefCasted>::cast_op_type<ConstRefCasted&>’ {aka ‘ConstRefCasted&’} from expression of type ‘pybind11::detail::make_caster<ConstRefCasted&>’ {aka ‘pybind11::detail::type_caster<ConstRefCasted>’}
   47 |     return caster;
      |            ^~~~~~

They also fail to compile with an explicit cast:

include/pybind11/detail/../cast.h:46:12: error: conversion from ‘pybind11::detail::make_caster<const ConstRefCasted&>’ {aka ‘pybind11::detail::type_caster<ConstRefCasted>’} to ‘pybind11::detail::type_caster<ConstRefCasted>::cast_op_type<const ConstRefCasted&>’ {aka ‘const ConstRefCasted&’} is ambiguous
  46 |     return (typename make_caster<T>::template cast_op_type<T>)caster;
     |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I haven't had a chance to test yet, but NVIDIA says that this has been fixed in the recently released 12.2.

Cool, I could confirm that I could build https://github.com/NVIDIA/apex on ArchLinux with cuda 12.2.0, gcc 13.1.1, pybind11 2.10.4.

I can build https://github.com/NVIDIA/apex with no issue on Debian 12 Bookworm with cuda 12.1.r12.1 and gcc 11.3.0