kernel-fusion

There are 9 repositories under kernel-fusion topic.

  • tracel-ai/burn

    Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

    Language:Rust13.4k881.1k731
  • chhzh123/Krill

    An efficient concurrent graph processing system

    Language:C++460010
  • wu-kan/GoPTX

    GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving

    Language:HTML181
  • nopperl/pytorch-fused-lamb

    LAMB go brrr

    Language:Python4100
  • ParCoreLab/gpu-fusion

    GPU fusion code and algorithm

    Language:Cuda1100
  • ShkalikovOleh/alpaka_expr_trees

    Compile time kernels fusion and expression trees as Alpaka boost.odeint backend. This is my team project developed in collaboration with and under the supervision of HZDR.

    Language:C++1100
  • fraidakis/PDS_BitonicSortCUDA

    Parallel and Distributed Systems - Exercise 3

    Language:Cuda
  • JonSnow1807/Fused-LayerNorm-CUDA-Operator

    High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.

    Language:Python
  • Nullvora/mabor

    Mabor is a cutting-edge deep learning framework built for flexibility, efficiency, and portability—without compromise.

    Language:Rust