Note that this repository is under active development.
Section | Videos | Codes |
---|---|---|
01 | 第1集 基于CuPy的CUDA跨平台开发环境配置 | course01_hello_cuda |
- ...
- ...
Thanks for the following excellent public learning resources.
-
codingonion/awesome-cuda-tensorrt-fpga : A collection of some awesome public NVIDIA CUDA, TensorRT, AMD ROCm and FPGA projects.
-
codingonion/cuda-beginner-course-cpp-version : bilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码。
-
codingonion/cuda-beginner-course-rust-version : bilibili视频【CUDA 12.1 并行编程入门(Rust语言版)】配套代码。
-
codingonion/cuda-beginner-course-python-version : bilibili视频【CUDA 12.1 并行编程入门(Python语言版)】配套代码。
-
NVIDIA CUDA Docs : CUDA Toolkit Documentation.
-
NVIDIA/cuda-samples : Samples for CUDA Developers which demonstrates features in CUDA Toolkit.
-
NVIDIA/CUDALibrarySamples : CUDA Library Samples.
-
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese : This is a Chinese translation of the CUDA programming guide. 本项目为 CUDA C Programming Guide 的中文翻译版。
-
brucefan1983/CUDA-Programming : Sample codes for my CUDA programming book.
-
YouQixiaowu/CUDA-Programming-with-Python : 关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码。
-
QINZHAOYU/CudaSteps : 基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。
-
sangyc10/CUDA-code : B站视频教程【CUDA编程基础入门系列(持续更新)】配套代码。
-
RussWong/CUDATutorial : A CUDA tutorial to make people learn CUDA program from 0.
-
DefTruth/cuda-learn-note : 🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
-
Liu-xiandong/How_to_optimize_in_GPU : This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
-
enp1s0/ozIMMU : FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme. arxiv.org/abs/2306.11975
-
Bruce-Lee-LY/matrix_multiply : Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
-
Bruce-Lee-LY/cuda_hgemm : Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Bruce-Lee-LY/cuda_hgemv : Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
-
Cjkkkk/CUDA_gemm : A simple high performance CUDA GEMM implementation.
-
AyakaGEMM/Hands-on-GEMM : A GEMM tutorial.
-
zpzim/MSplitGEMM : Large matrix multiplication in CUDA.
-
jundaf2/CUDA-INT8-GEMM : CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API.
-
chanzhennan/cuda_gemm_benchmark : Base on gtest/benchmark, refer to https://github.com/Liu-xiandong/How_to_optimize_in_GPU.
-
YuxueYang1204/CudaDemo : Implement custom operators in PyTorch with cuda/c++.
-
CoffeeBeforeArch/cuda_programming : Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch.
-
rbaygildin/learn-gpgpu : Algorithms implemented in CUDA + resources about GPGPU.
-
PacktPublishing/Learn-CUDA-Programming : Learn CUDA Programming, published by Packt.
-
PacktPublishing/Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA : Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt.
-
PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA : Hands-On GPU Programming with Python and CUDA, published by Packt.