TT430's Stars
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
luizinhosuraty/pciemu
PCIe Device Emulation in QEMU
Arteris-IP/tlm2-interfaces
contains TLM2 based interfaces for AXI, ACE, CHI and other standard protocols
mmxsrup/axi4-interface
AXI4 and AXI4-Lite interface definitions
PrincetonUniversity/LLMCompass
srush/Triton-Puzzles
Puzzles for learning Triton
FlagOpen/FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
microsoft/triton-shared
Shared Middle-Layer for Triton Compilation
Cambricon/triton-linalg
Development repository for the Triton-Linalg conversion
openhwgroup/cvfpu
Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats.
ggerganov/ggml
Tensor library for machine learning
srush/annotated-mamba
Annotated version of the Mamba paper
state-spaces/mamba
Mamba SSM architecture
XUANTIE-RV/riscv-matrix-extension-spec
A matrix extension proposal for AI applications under RISC-V architecture
radarFudan/Awesome-state-space-models
Collection of papers on state-space models
Clo91eaf/libspike-interfaces
cocotb/cocotb
cocotb, a coroutine based cosimulation library for writing VHDL and Verilog testbenches in Python
BUAA-CI-LAB/Literatures-on-SRAM-based-CIM
A reading list for SRAM-based Compute-In-Memory (CIM) research.
mfem/mfem
Lightweight, general, scalable C++ library for finite element methods
openai/blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
tenstorrent/tt-buda
Tenstorrent TT-BUDA Repository
CMU-SAFARI/prim-benchmarks
PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publicly-available real-world PIM architecture, the UPMEM PIM architecture. Described by Gómez-Luna et al. (https://arxiv.org/abs/2105.03814).
VIA-Research/uPIMulator
ilyakurdyukov/libminiomp
Minimal implementation of the OpenMP runtime library.
georgia-tech-synergy-lab/SIGMA
RTL implementation of Flex-DPE.
tsinghua-ideal/spada-sim
The simulator for SPADA, an SpGEMM accelerator with adaptive dataflow
ucb-bar/gemmini
Berkeley's Spatial Array Generator
sfu-arch/SpGEMM
bsc-pm/nanox
Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.