/gemm-pybind-learning

This repo demonstrates how to use pybind11 to provide python interface for highly optimized CUDA code. It only contains basic functionality.

Primary LanguageCMakeGNU General Public License v3.0GPL-3.0

gemm-pybind-learning

This repo demonstrates how to use pybind11 to provide python interface for highly optimized CUDA code. It only contains basic functionality. Present work uses modern CMake/Cuda and Yujia Zhai's GEMV implementention approach. CmakeLists comes from pkestene

Build and Install

This project requires CMake>=3.18, it can be built with code below:

git clone --recurse-submodules https://github.com/dongdongban/gemm-pybind-learning
cmake -S . -B build -DCMAKE_CUDA_ARCHITECTURES="75" && cd build
cmake --build .

The device architecture "sm_75" should be replaced by your native GPU capability.

Verification

if Nothing went wrong, check your module with these codes:

cd Optimizing-SGEMV-on-NVIDIA-GPUs
python -c 'import mygemm; mygemm.host(4096, 4096, 1); mygemm.host(4096, 4096, 2)'