cuda-programming
There are 400 repositories under cuda-programming topic.
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
NVIDIA/cccl
CUDA Core Compute Libraries
eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
sail-sg/Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
PaddleJitLab/CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
nosferalatu/SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
jaredhoberock/stanford-cs193g-sp2010
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
HMUNACHI/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
SunsetQuest/CudaPAD
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
ROCm/HIP-CPU
An implementation of HIP that works on CPUs, across OSes.
eyalroz/cuda-kat
CUDA kernel author's tools
tgautam03/xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
FahimFBA/CUDA-WSL2-Ubuntu
Install CUDA on Windows11 using WSL2
mikeroyal/CUDA-Guide
CUDA Guide
emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
HuangCongQing/cuda-learning
cuda编程学习入门
Accelsnow/gaussian-splatting-distwar
DISTWAR atomic reduction optimization on "3D Gaussian Splatting for Real-Time Radiance Field Rendering".
LinhanDai/yolov9-tensorrt
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
xmba15/ransac_lines_fitting_gpu
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
fjramireg/StiffMa
StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.
flin3500/Cuda-Google-Colab
The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.
Koushikphy/Intro-to-CUDA-Fortran
A Complete beginner's introduction to programming with CUDA Fortran
ashvardanian/cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
Accelsnow/diff-gaussian-rasterization-distwar
DISTWAR-enabled rasterization engine for the paper "3D Gaussian Splatting for Real-Time Rendering of Radiance Fields"
AhmetFurkanDEMIR/NVIDIA-GPU-benchmark
NVIDIA GPU benchmark
jerry060599/KittenGpuLBVH
A high performance and friendly GPU LBVH implementation.
YichengDWu/FlashAttention.jl
Julia implementation of the Flash Attention algorithm
priteshgohil/CUDA-programming-tutorial
Get started with CUDA programming
DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).