TT430

Tsinghua University

TT430's Stars

Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python14.8k1.4k
luizinhosuraty/pciemu
PCIe Device Emulation in QEMU
Language:C5314
Arteris-IP/tlm2-interfaces
contains TLM2 based interfaces for AXI, ACE, CHI and other standard protocols
Language:C++539
mmxsrup/axi4-interface
AXI4 and AXI4-Lite interface definitions
Language:SystemVerilog8627
PrincetonUniversity/LLMCompass
Language:Python9725
srush/Triton-Puzzles
Puzzles for learning Triton
Language:Jupyter Notebook1.2k93
FlagOpen/FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
Language:Python37557
microsoft/triton-shared
Shared Middle-Layer for Triton Compilation
Language:MLIR21248
Cambricon/triton-linalg
Development repository for the Triton-Linalg conversion
Language:C++16215
openhwgroup/cvfpu
Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats.
Language:SystemVerilog443117
ggerganov/ggml
Tensor library for machine learning
Language:C++11.4k1.1k
srush/annotated-mamba
Annotated version of the Mamba paper
Language:Jupyter Notebook46318
state-spaces/mamba
Mamba SSM architecture
Language:Python13.6k1.2k
XUANTIE-RV/riscv-matrix-extension-spec
A matrix extension proposal for AI applications under RISC-V architecture
Language:Makefile11921
radarFudan/Awesome-state-space-models
Collection of papers on state-space models
56320
Clo91eaf/libspike-interfaces
Language:Nix2
cocotb/cocotb
cocotb, a coroutine based cosimulation library for writing VHDL and Verilog testbenches in Python
Language:Python1.8k526
BUAA-CI-LAB/Literatures-on-SRAM-based-CIM
A reading list for SRAM-based Compute-In-Memory (CIM) research.
402
mfem/mfem
Lightweight, general, scalable C++ library for finite element methods
Language:C++1.8k506
openai/blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Language:Cuda1k202
tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Language:C++57287
tenstorrent/tt-buda
Tenstorrent TT-BUDA Repository
Language:Python26737
CMU-SAFARI/prim-benchmarks
PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publicly-available real-world PIM architecture, the UPMEM PIM architecture. Described by Gómez-Luna et al. (https://arxiv.org/abs/2105.03814).
Language:C14450
VIA-Research/uPIMulator
Language:C10814
ilyakurdyukov/libminiomp
Minimal implementation of the OpenMP runtime library.
Language:C111
georgia-tech-synergy-lab/SIGMA
RTL implementation of Flex-DPE.
Language:Verilog9328
tsinghua-ideal/spada-sim
The simulator for SPADA, an SpGEMM accelerator with adaptive dataflow
Language:Rust314
ucb-bar/gemmini
Berkeley's Spatial Array Generator
Language:Scala834177
sfu-arch/SpGEMM
Language:Verilog332
bsc-pm/nanox
Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.
Language:C++3815

TT430

TT430's Stars

Dao-AILab/flash-attention

luizinhosuraty/pciemu

Arteris-IP/tlm2-interfaces

mmxsrup/axi4-interface

PrincetonUniversity/LLMCompass

srush/Triton-Puzzles

FlagOpen/FlagGems

microsoft/triton-shared

Cambricon/triton-linalg

openhwgroup/cvfpu

ggerganov/ggml

srush/annotated-mamba

state-spaces/mamba

XUANTIE-RV/riscv-matrix-extension-spec

radarFudan/Awesome-state-space-models

Clo91eaf/libspike-interfaces

cocotb/cocotb

BUAA-CI-LAB/Literatures-on-SRAM-based-CIM

mfem/mfem

openai/blocksparse

tenstorrent/tt-metal

tenstorrent/tt-buda

CMU-SAFARI/prim-benchmarks

VIA-Research/uPIMulator

ilyakurdyukov/libminiomp

georgia-tech-synergy-lab/SIGMA

tsinghua-ideal/spada-sim

ucb-bar/gemmini

sfu-arch/SpGEMM

bsc-pm/nanox