pybind11-cuda

CMake+nvcc+msvc==pure_chaos. I learned it the hard way so you don't have to.

Starting point for GPU accelerated python libraries

Present work uses modern CMake/Cuda approach

Prerequisites

CUDA

Python 3.6 or greater

CMake >= 3.18 (for CUDA support and the new FindPython3 module)

You can use variable CMAKE_CUDA_ARCHITECTURES instead of CUDAFLAGS:

mkdir build; cd build
# provide a default cuda hardware architecture to build for
cmake -DCMAKE_CUDA_ARCHITECTURES="75" ..
make

Test it with python3 ./src/test_cxx_module.py

Compiles out of the box with cmake, even in Windows with msvc
Easy-to-modify demos with modern c++ experience by using libs such as Thrust and cutlass
Numpy integration
C++ Templating for composable kernels with generic data types

The search order for cuDNN in cutlass is a bit surprising as of now (v2.10.0). It is recommended to copy your desired version of cuDNN into your current CUDA directory. And take notice on the detected path reported by cutlass's CMakeLists.txt.