FLAMEGPU/FLAMEGPU2

Windows CI failures

Opened this issue · 4 comments

At some point between 2024-04-26 and 2024-06-11 our Windows CI has stopped working, for CUDA 11.8 and CUDA 12.3 builds (11.0 builds are fine).

In both cases, CMake could not find CUDA language support:

CMake Warning at cmake/CheckCompilerFunctionality.cmake:23 (message):
  CUDA Language Support Not Found

This is after CUDA was installed successfully via network installer at the appropriate location and path updated

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3

Probably worth debugging in a separate mcuh smaller repo to avoid CI spam / long CI times when it does work.
i.e. a fork of flamegpu with most CI disabled, or a fork of https://github.com/ptheywood/cuda-cmake-github-actions (the repo I used to develop the cuda install scripts in the first place).

Things to check could be:

  • nvcc is correctly on the path just before the cmake configuration phase
  • nvcc can manually compile a test program without CMake (which is what cmake does)
  • Check which versions of packages have changes in the github actions runner between those dates, try and reproduce on a local windows build
  • Output the CMake log files when the step fails, to see what error messages were provided during CUDA language support checking.

Possibly just a bad CI image, failures were using 20240603.1.0 as in actions/runner-images#10004.

I've retriggered one of the failed workflows to see if the issues has magically gone away again.

https://github.com/FLAMEGPU/FLAMEGPU2/actions/runs/9463757119/job/27223363960


Not fixed by 20240630.1.0, so not the same issue. Probably still caused by some change in the base image.

Will be simpler to debug in a fork or less complex repository (i.e. ptheywood/cuda-cmake-github-actionsafter some updates)

From triggering a CI run on the above repo, it appears that CUDA is deciding the version of visual studio is incompatible when CMake is triggering the test build:

D:\a\cuda-cmake-github-actions\cuda-cmake-github-actions\build\CMakeFiles\CMakeScratch\TryCompile-um0ksd>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu    -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include"     --keep-dir cmTC_98de8\x64\Debug  -maxrregcount=0  --machine 64 --compile -cudart static -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FdcmTC_98de8.dir\Debug\vc143.pdb /FS /Zi /RTC1 /MDd /GR" -o cmTC_98de8.dir\Debug\main.obj "D:\a\cuda-cmake-github-actions\cuda-cmake-github-actions\build\CMakeFiles\CMakeScratch\TryCompile-um0ksd\main.cu" 
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include\crt/host_config.h(153): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [D:\a\cuda-cmake-github-actions\cuda-cmake-github-actions\build\CMakeFiles\CMakeScratch\TryCompile-um0ksd\cmTC_98de8.vcxproj]
      main.cu

However it says 2022 inclusive should be fine.

But it seems that this might be caused by Microsoft changing the numeric version to 19.40, but nvidia assumed that 1940 would be visual studio 202X.

CUDA 12.5 should not get angry, but older ones still will.

We need to tell CMake to use --allow-unsupported-compiler when testing CUDA detection, using -DCMAKE_CUDA_FLAGS="--allow-unsupported-compiler" might work to fix CI, but the issue will still be present for any windows users with the recent vs2022, and CUDA < 12.5. Can probably append conditionally set it in our CMake, but might be a bit awkward to fit in between C++ and CUDA language support checking.

Might also require a CMake >= 3.29.4 for compiler detection to use the user provided flags / 3.29.4 will fix the issue anyway (but users will need to specify allowing unsupported compilers too) if I'm understanding this correctly:

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9546

So it might just be that on windows, if using MSVC >= whichever version this is, and CUDA < 12.5 12.4 you must also use CMake >= 3.29.4 and specify allowing unsupported compilers.

We can probably emit an appropriate error message somewhere like we do for other compiler issues if all these conditions are met (and a compiler check fails).
Probably as an nested if in here.

if(NOT CMAKE_CUDA_COMPILER)
message(WARNING "CUDA Language Support Not Found")
set(FLAMEGPU_CheckCompilerFunctionality_RESULT "NO" PARENT_SCOPE)
return()
endif()

Edit

Yep -DCMAKE_CUDA_FLAGS="--allow-unsupported-compiler" at cmake configuration time fixes my standalone test, so:

  1. Add a warning if cmake language detection fails and OS is windows and MSVC versions is > the bad one and CUDA < 12.5 12.4, emit a warning suggesting users reconfigure with -DCMAKE_CUDA_FLAGS="--allow-unsupported-compiler" due to an incorrect assumption in CUDA's compiler version checking
  • Not certain if this also requires newer CMake or not, would need to do a local test really. CMake in the windows action image is currently 3.30.
  1. add -DCMAKE_CUDA_FLAGS="--allow-unsupported-compiler" to our CI workflows on windows (possibly just for specific CUDA versions, but could probably just allow it for all of them if we want to keep it simpler).

Final CI runs I'd set going on Friday confirm that CUDA <= 12.3 with -DCMAKE_CUDA_FLAGS="--allow-unsupported-compiler" does require CMake >= 3.29.4 (3.29.3 failes, 3.29.6 passes, which are the closest not yanked from pip)

So (ideally) the warning needs to be emitted when:

  • CUDA language support detection fails
  • And using MSVC
  • And MSVC_VERSION >= 1940 (we could err on the side of caution and also ensure MSVC_VERSION < 1950, but this might fail if VS2022 has many more versions
  • And the attempted CUDA is < 12.4 (if this is accessible at during finding, not sure it is).
  • And the user has not already set --allow-unsupported-compiler (i.e. the suggest fix of setting this would not work).
  • If CMAKE_VERSION < 3.29.4 then the warning will need to be different.

It should tell users that:

  • There is a known issue with CUDA < 12.4 and VS 2022 >= 1940, with several options
  • They must specify -DCMAKE_CUDA_FLAGS="--allow-unsupported-compiler"
    • And update to CMake >= 3.29.4 if required
  • Update to CUDA >= 12.4 (probably not ideal with RTC perf)
  • Downgrade Visual Studio 2022 to whichever version is 1939 (not sure how trivial this is, prolly better to update cmake and set the flag)

Could also be worth adding to installation instructions for MSVC users or somewhere. It's not an us issue, it's a generic CUDA < 12.4 && MSVC_VERISON >= 1940 (but still VS 2022) issue.

It would be worth updating our CI to include CUDA 11.x, 12.0 & 12.5 builds at the same time to cover this.