Paran0idy's Stars
mindspore-ai/akg
AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.
TiledTensor/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
FlagOpen/FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
MarioLulab/Needle
A basic deep learning library, comparable to a very minimal version of PyTorch.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
daattali/beautiful-jekyll
✨ Build a beautiful and simple website in literally minutes. Demo at https://beautifuljekyll.com
isocpp/CppCoreGuidelines
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
cornell-zhang/allo
Allo: A Programming Model for Composable Accelerator Design
Cambricon/triton-linalg
Development repository for the Triton-Linalg conversion
cuda-mode/lectures
Material for cuda-mode lectures
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
SForeKeeper/buddy-mlir
An MLIR-Based Ideas Landing Project
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
wuye9036/CppTemplateTutorial
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
buddy-compiler/buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
xtekky/gpt4free
The official gpt4free repository | various collection of powerful language models
llvm/torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
cuda-mode/resource-stream
CUDA related news and material links
srush/Triton-Puzzles
Puzzles for learning Triton
unslothai/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Deep-Learning-Profiling-Tools/triton-viz
Jokeren/Awesome-GPU
Awesome resources for GPUs
gfvvz/Triton-Compiler
Triton Compiler related materials.
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
ROCm/aotriton
Ahead of Time (AOT) Triton Math Library
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.