cuda-programming
There are 501 repositories under cuda-programming topic.
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Rust-GPU/rust-cuda
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
NVIDIA/cccl
CUDA Core Compute Libraries
brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
sail-sg/Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
PaddleJitLab/CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
yassa9/qwen600
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
nosferalatu/SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
jaredhoberock/stanford-cs193g-sp2010
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
HenryNdubuaku/cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
tgautam03/xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
ROCm/HIP-CPU
An implementation of HIP that works on CPUs, across OSes.
SunsetQuest/CudaPAD
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
eyalroz/cuda-kat
CUDA kernel author's tools
emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
goabiaryan/awesome-gpu-engineering
GPU Engineering for AI Systems
mikeroyal/CUDA-Guide
CUDA Guide
FahimFBA/CUDA-WSL2-Ubuntu
Install CUDA on Windows11 using WSL2
Accelsnow/gaussian-splatting-distwar
DISTWAR atomic reduction optimization on "3D Gaussian Splatting for Real-Time Radiance Field Rendering".
HuangCongQing/cuda-learning
cuda编程学习入门
jerry060599/KittenGpuLBVH
A high performance and friendly GPU LBVH implementation.
toxy4ny/artaxerxes
Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs
LinhanDai/yolov9-tensorrt
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
Koushikphy/Intro-to-CUDA-Fortran
A Complete beginner's introduction to programming with CUDA Fortran
ashvardanian/PyBindToGPUs
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
xmba15/ransac_lines_fitting_gpu
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
KarhouTam/cuda-kernels
Some common CUDA kernel implementations (Not the fastest).
fjramireg/StiffMa
StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.
Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
flin3500/Cuda-Google-Colab
The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.