kernel-fusion
There are 9 repositories under kernel-fusion topic.
tracel-ai/burn
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
chhzh123/Krill
An efficient concurrent graph processing system
wu-kan/GoPTX
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
nopperl/pytorch-fused-lamb
LAMB go brrr
ParCoreLab/gpu-fusion
GPU fusion code and algorithm
ShkalikovOleh/alpaka_expr_trees
Compile time kernels fusion and expression trees as Alpaka boost.odeint backend. This is my team project developed in collaboration with and under the supervision of HZDR.
fraidakis/PDS_BitonicSortCUDA
Parallel and Distributed Systems - Exercise 3
JonSnow1807/Fused-LayerNorm-CUDA-Operator
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.
Nullvora/mabor
Mabor is a cutting-edge deep learning framework built for flexibility, efficiency, and portability—without compromise.