stotko/stdgpu

Building failed when Windows 11 + CUDA 12.5 + MSVC 19.41 + CMake 3.29.4 + "STDGPU_BACKEND=STDGPU_BACKEND_CUDA"

Opened this issue · 6 comments

Describe the bug

Building VS project failed when the backend is CUDA 12.5.

Steps to reproduce

  1. Prerequisites:
    1. Windows 11,
    2. CUDA 12.5,
    3. MSVC 19.41 (VS 2022 Preview),
    4. CMake 3.29.4,
    5. Download stdgpu source.
  2. Configure CMake cache and generate VS project files using CMake GUI with STDGPU_BACKEND equaling to STDGPU_BACKEND_CUDA.
  3. Open VS solution by VS 2022 Preview.
    image
  4. Build project stdgpu.

Expected behavior

Building succeed.

Actual behavior

Building failed.

CMake configuration output:

Selecting Windows SDK version 10.0.20348.0 to target Windows 10.0.22610.
Created device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=/W2>
Created test device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Wno-deprecated-declarations>
Detected user-provided CCs : 52
Created host flags : $<$<COMPILE_LANGUAGE:CXX>:/W2>
Created test host flags : $<$<COMPILE_LANGUAGE:CXX>:/wd4996>
Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) (Required is at least version "1.9.1")
CMake Deprecation Warning at test/googletest-1.11.0/CMakeLists.txt:4 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


CMake Deprecation Warning at test/googletest-1.11.0/googletest/CMakeLists.txt:56 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.



************************ stdgpu Configuration Summary *************************

General:
  Version                                   :   1.3.0
  System                                    :   Windows
  Build type                                :   

Build:
  STDGPU_BACKEND                            :   STDGPU_BACKEND_CUDA
  STDGPU_BUILD_SHARED_LIBS                  :   OFF
  STDGPU_SETUP_COMPILER_FLAGS               :   ON
  STDGPU_TREAT_WARNINGS_AS_ERRORS           :   OFF
  STDGPU_ANALYZE_WITH_CLANG_TIDY            :   OFF
  STDGPU_ANALYZE_WITH_CPPCHECK              :   OFF

Configuration:
  STDGPU_ENABLE_CONTRACT_CHECKS             :   ON
  STDGPU_USE_32_BIT_INDEX                   :   ON

Examples:
  STDGPU_BUILD_EXAMPLES                     :   ON

Tests:
  STDGPU_BUILD_TESTS                        :   ON
  STDGPU_BUILD_TEST_COVERAGE                :   OFF

Documentation:
  Doxygen                                   :   NO

*******************************************************************************

Configuring done (4.1s)

VS building output:

生成开始于 21:00...
1>------ 已启动生成: 项目: ZERO_CHECK, 配置: Debug x64 ------
1>1>Checking Build System
2>------ 已启动生成: 项目: stdgpu, 配置: Debug x64 ------
2>Building Custom Rule E:/Repos/open3d/build/stdgpu/src/ext_stdgpu/src/stdgpu/CMakeLists.txt
2>iterator.cpp
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(90,29): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(101,29): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(115,55): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(130,55): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(208,40): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(218,49): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(252,37): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(263,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(271,5): error C3861: “__syncthreads”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(280,12): error C3861: “__syncthreads_and”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(289,12): error C3861: “__syncthreads_or”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(298,5): error C3861: “__syncwarp”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(307,12): error C3861: “__any_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(316,12): error C3861: “__all_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(325,12): error C3861: “__ballot_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(335,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(346,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(357,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(368,12): error C3861: “__shfl_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(377,35): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(388,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(398,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(406,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(415,39): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(416,40): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(417,13): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(427,34): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(438,34): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(479,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(489,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(499,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(509,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cuda\std\detail\libcxx\include\__cuda\ptx\ptx_helper_functions.h(40,44): error C3861: “__cvta_generic_to_shared”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cuda\std\detail\libcxx\include\__cuda\ptx\ptx_helper_functions.h(60,44): error C3861: “__cvta_generic_to_global”: 找不到标识符
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,26): error C3856: “is_proxy_reference”: 符号不是 模板 类
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,70): error C2065: “Container”: 未声明的标识符
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,43): error C2923: "stdgpu::detail::back_insert_iterator_proxy": "Container" 不是参数 "Container" 的有效 模板 类型参数
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,43): error C2143: 语法错误: 缺少“;”(在“stdgpu::detail::back_insert_iterator_proxy”的前面)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,79): error C2059: 语法错误:“>”
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,7): error C2059: 语法错误:“public”
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,22): error C2872: “detail”: 不明确的符号
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,30): error C2039: "true_type": 不是 "thrust::detail" 的成员
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(452,1): error C2143: 语法错误: 缺少“;”(在“{”的前面)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(452,1): error C2447: “{”: 缺少函数标题(是否是老式的形式表?)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(459,22): error C2872: “detail”: 不明确的符号
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(467,22): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,38): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,46): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,67): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,67): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,40): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,48): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,69): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,69): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,40): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,48): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,69): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,69): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,42): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,50): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,71): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,71): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>limits.cpp
2>正在生成代码...
2>已完成生成项目“stdgpu.vcxproj”的操作 - 失败。
========== 生成: 1 成功,1 失败,0 最新,0 已跳过 ==========
========== 生成 于 21:00 完成,耗时 01.772 秒 ==========

System:

  • OS: Windows 11
  • Compiler: MSVC 19.41 (Visual Studio 2022 Preview)
  • Backend: CUDA 12.5
  • Library version: I don't know (download from here, provided by Open3D)

As long as STDGPU_BACKEND=STDGPU_BACKEND_OPENMP, everything can work normally.

I can reproduce these compilation errors on Ubuntu 22.04 + CUDA 12.5 + latest commit from master branch. Furthermore, only the CUDA backend seems to be affected and, more precisely, I suspect that the problem might be locally somewhere in thrust since several CUDA-only expressions coming from there are incorrectly used during the compilation of a .cpp file (in your case iterator.cpp).

A very similar error in Open3D has also been reported but within a different part of it: isl-org/Open3D#6813

I saw the same issue when I built Open3D when turning -DBUILD_CUDA_MODULE=ON
on Ubuntu 22.04 + CUDA 12.5 with the Open3D/stdgpu cmake setup:
GIT_REPOSITORY https://github.com/stotko/stdgpu.git
GIT_TAG master


In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/barrier_cluster.h:30,
from /usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx.h:74,
from /usr/local/cuda/include/cuda/ptx:19,
from /usr/local/cuda/include/cuda/discard_memory:25,
from /usr/local/cuda/include/cub/util_device.cuh:57,
from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:48,
from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:34,
from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:50,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:30,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:77,
from /usr/local/cuda/include/thrust/detail/reference.h:36,
from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/../stdgpu/iterator.h:29,
from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/impl/iterator.cpp:16:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘uint32_t cuda::ptx::__4::__as_ptr_smem(const void*)’:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h:40:44: error: ‘__cvta_generic_to_shared’ was not declared in this scope
40 | return static_cast<_CUDA_VSTD::uint32_t>(__cvta_generic_to_shared(__ptr));
| ^~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘uint64_t cuda::ptx::__4::__as_ptr_gmem(const void*)’:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h:60:44: error: ‘__cvta_generic_to_global’ was not declared in this scope
60 | return static_cast<_CUDA_VSTD::uint64_t>(__cvta_generic_to_global(__ptr));
| ^~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘_Tp* cuda::ptx::__4::__from_ptr_smem(size_t)’:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h:73:33: error: there are no arguments to ‘__cvta_shared_to_generic’ that depend on a template parameter, so a declaration of ‘__cvta_shared_to_generic’ must be available [-fpermissive]
73 | return reinterpret_cast<_Tp*>(__cvta_shared_to_generic(__ptr));
| ^~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h:73:33: note: (if you use ‘-fpermissiv’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘_Tp* cuda::ptx::__4::__from_ptr_gmem(size_t)’:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h:94:33: error: there are no arguments to ‘__cvta_global_to_generic’ that depend on a template parameter, so a declaration of ‘__cvta_global_to_generic’ must be available [-fpermissive]
94 | return reinterpret_cast<_Tp*>(__cvta_global_to_generic(__ptr));
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:48,
from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:34,
from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:50,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:30,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:77,
from /usr/local/cuda/include/thrust/detail/reference.h:36,
from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/../stdgpu/iterator.h:29,
from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/impl/iterator.cpp:16:
/usr/local/cuda/include/cub/util_device.cuh: In static member function ‘static typename AgentT::TempStorage& cub::CUB_200400___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl::get_temp_storage(cub::CUB_200400___CUDA_ARCH_LIST___NS::NullType&, cub::CUB_200400___CUDA_ARCH_LIST___NS::detail::vsmem_t&)’:
/usr/local/cuda/include/cub/util_device.cuh:160:63: error: ‘blockIdx’ was not declared in this scope
160 | static_cast<char*>(vsmem.gmem_ptr) + (vsmem_per_block * blockIdx.x));
| ^~~~~~~~
/usr/local/cuda/include/cub/util_device.cuh: In static member function ‘static bool cub::CUB_200400___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl::discard_temp_storage(typename AgentT::TempStorage&)’:
/usr/local/cuda/include/cub/util_device.cuh:201:38: error: ‘threadIdx’ was not declared in this scope
201 | const std::size_t linear_tid = threadIdx.x;
| ^~~~~~~~~
/usr/local/cuda/include/cub/util_device.cuh:202:50: error: ‘blockDim’ was not declared in this scope
202 | const std::size_t block_stride = line_size * blockDim.x;
| ^~~~~~~~

__syncthreads is defined in <CUDA_HOME>/targets/x86_64-linux/include/device_functions.h. This header file should be introduced by cuda_runtime.h or cuda_runtime_api.h. And this header file is included only when __CUDACC__ is defined, which indicates that nvcc should be used instead of c++.

Using nvcc instead of g++ (or any other supported host compiler) would workaround the problem, but this is not the root cause of this issue. iterator.cpp is actually designed to be consumed by host compilers as no device-only code is involved there and any host-device code is properly guarded by respective macro logic. Unfortunately, this does not seem to be the case anymore on thrust's side for version 2.4.x (bundled with CUDA 12.5) whereas older versions, i.e. <= 2.3.x, work fine.

I did some further investigations and it looks like the error disappears with thrust 2.5.0 again. Although I would still need test it more thoroughly for confirmation, chances are that CUDA 12.6 could include this newer thrust version and, in turn, a proper fix.

I think i have a similar problem with CUDA 12.6 (thrust 2.5.0), but not exactly the same. Seems to be also related to thrust though around the specialization of thrust::detail::is_proxy_reference

F:\temp\stdgpu\build(master -> origin)
λ cmake .. -GNinja
-- Configuring with CMake 3.27.9
-- The CXX compiler identification is MSVC 19.39.33521.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 12.6.20
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Created device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=/W2>
-- Created test device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Wno-deprecated-declarations>
-- Detecting CCs of GPUs : F:/temp/stdgpu/cmake/cuda/compute_capability.cpp
-- Detecting CCs of GPUs : F:/temp/stdgpu/cmake/cuda/compute_capability.cpp - Success (found CCs : 86)
-- Enabled compilation for CC 86
-- Created host flags : $<$<COMPILE_LANGUAGE:CXX>:/W2>
-- Created test host flags : $<$<COMPILE_LANGUAGE:CXX>:/wd4996>
-- Could NOT find ClangFormat: Found unsuitable version "16.0.4", but required is exact version "10" (found C:/Program Files/LLVM/bin/clang-format.exe)
-- Found thrust: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/include (found suitable version "2.5.0", minimum required is "1.9.9")
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/include (found suitable version "12.6.20", minimum required is "11.0")
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.44.0.windows.1")
-- Google Benchmark version: v0.0.0, normalized to 0.0.0
-- Looking for shm_open in rt
-- Looking for shm_open in rt - not found
-- Compiling and running to test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX -- success
-- Compiling and running to test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Compiling and running to test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX -- failed to compile
-- Compiling and running to test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Compiling and running to test HAVE_PTHREAD_AFFINITY
-- Performing Test HAVE_PTHREAD_AFFINITY -- failed to compile
-- The C compiler identification is MSVC 19.39.33521.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Python3: C:/Python310/python.exe (found version "3.10.4") found components: Interpreter
--
-- ************************ stdgpu Configuration Summary *************************
--
-- General:
--   Version                                   :   1.3.0
--   System                                    :   Windows
--   Build type                                :   Debug
--
-- Build:
--   STDGPU_BACKEND                            :   STDGPU_BACKEND_CUDA
--   STDGPU_BUILD_SHARED_LIBS                  :   OFF
--   STDGPU_SETUP_COMPILER_FLAGS               :   ON
--   STDGPU_COMPILE_WARNING_AS_ERROR           :   OFF
--   STDGPU_ANALYZE_WITH_CLANG_TIDY            :   OFF
--   STDGPU_ANALYZE_WITH_CPPCHECK              :   OFF
--
-- Configuration:
--   STDGPU_ENABLE_CONTRACT_CHECKS             :   ON
--   STDGPU_USE_32_BIT_INDEX                   :   ON
--
-- Examples:
--   STDGPU_BUILD_EXAMPLES                     :   ON
--
-- Benchmarks:
--   STDGPU_BUILD_BENCHMARKS                   :   ON
--
-- Tests:
--   STDGPU_BUILD_TESTS                        :   ON
--   STDGPU_BUILD_TEST_COVERAGE                :   OFF
--   CMAKE_VERIFY_INTERFACE_HEADER_SETS        :   <Not Defined> (-> OFF)
--
-- Documentation:
--   STDGPU_BUILD_DOCUMENTATION                :   OFF
--
-- *******************************************************************************
--
-- Configuring done (26.6s)
-- Generating done (0.0s)
-- Build files have been written to: F:/temp/stdgpu/build

F:\temp\stdgpu\build(master -> origin)
λ cmake --build .
[5/84] Building CXX object src\stdgpu\CMakeFiles\stdgpu.dir\impl\iterator.cpp.obj
FAILED: src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.obj
C:\PROGRA~1\MICROS~4\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\cl.exe  /nologo /TP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -IF:\temp\stdgpu\src\stdgpu\.. -IF:\temp\stdgpu\build\src\stdgpu\include -external:I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -external:W0 /DWIN32 /D_WINDOWS /EHsc /Ob0 /Od /RTC1 -std:c++17 -MDd -Zi /W2 /showIncludes /Fosrc\stdgpu\CMakeFiles\stdgpu.dir\impl\iterator.cpp.obj /Fdsrc\stdgpu\CMakeFiles\stdgpu.dir\stdgpu.pdb /FS -c F:\temp\stdgpu\src\stdgpu\impl\iterator.cpp
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C3856: 'is_proxy_reference': symbol is not a class template
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C2065: 'Container': undeclared identifier
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C2923: 'stdgpu::detail::back_insert_iterator_proxy': 'Container' is not a valid template type argument for parameter 'Container'
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): note: see declaration of 'Container'
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C2143: syntax error: missing ';' before 'stdgpu::detail::back_insert_iterator_proxy'
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C2059: syntax error: '>'
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C2059: syntax error: 'public'
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(394): error C2872: 'detail': ambiguous symbol
F:\temp\stdgpu\src\stdgpu/impl/iterator_detail.h(390): note: could be 'thrust::detail'
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include\thrust/iterator/detail/discard_iterator_base.h(40): note: or       'thrust::THRUST_200500___CUDA_ARCH_LIST___NS::detail'
.... and more errors

I used it before with CUDA 12.3 and compilation was fine