/Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

Primary LanguageMakefileMIT LicenseMIT

Cute-Learning

Welcome to the Cute-Learning repository! This project showcases several example implementations using Cutlass CuTe, a powerful tool for high-performance computing.

Features

This repository includes implementations for:

  • GEMM (General Matrix Multiply)
  • GEMV (General Matrix-Vector Multiply)
  • Flash-Decoding
  • Data Copy
  • LDSM (ldmatrix instruction)
  • Tensor Dequant
  • TODO... (More features to come!)

GEMM

The GEMM implementation is optimized for performance. Below is a performance graph showcasing its efficiency:

GEMM Performance

Refer to the following blog:

LDSM

Refer to the following blog:


We hope you find this repository useful for your learning and development needs. Contributions and feedback are welcome!