MUTLASS 0.1.1

MUTLASS 0.1.1 - September 2024

MUTLASS(MUSA Templates for Linear Algebra Subroutines) is a header-only library for implementing high-performance matrix-matrix multiplication (GEMM) within MUSA(Meta-computing Unified System Architecture). It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement muDNN.

See the Quick Start Guide to get started quickly.

Note: MUTLASS uses the CuTe library, introduced in CUTLASS 3.x, as the backend, and thus is incompatible with most implementations of CUTLASS 2.x.

What's in MUTLASS 0.1.1

MUTLASS 0.1.1 is an open-release version based on CUTLASS 3.5 providing:

MuTe, a core library and backend adapted from CUTLASS CuTe
Quyuan Features
- MMA primitives: TensorFloat32, BFloat16, Float16, INT8
FMA/MMA GEMM Kernels targeting the Quyuan architecture
- Note: this is a beta release. Further updates to MUTLASS will include performance improvements, feature enablement, and possible breaking changes to the API
MUTLASS Profiler, Library, and Utilities
Two examples that demonstrate the usage of the low-level API and the collective builders to build GEMM kernels

Minimum requirements:

Architecture: Quyuan
Compiler: MCC 3.1.0
MUSA Toolkit version: 3.1.0

Documentation

Quick Start Guide - build and run MUTLASS

Building MUTLASS

MUTLASS is a header-only template library and does not need to be built to be used by other projects. Client applications should target MUTLASS's include/ directory in their include paths.

MUTLASS unit tests, examples, and utilities can be build with CMake. The minimum version of CMake is given in the QuickStart guide.

Create a build directory within the MUTLASS project, then run CMake. By default MUTLASS will build kernels for MUSA architecture version 2.2.

MooreThreads/mutlass

MUTLASS 0.1.1

What's in MUTLASS 0.1.1

Documentation

Building MUTLASS