MeeCreeps

MeeCreeps's Stars

RWKV/rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Language:C++1.4k93
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
57524
microsoft/ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
Language:C++15822
google/uVkCompute
A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Language:C++21736
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
Language:C++43932
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
Language:Python3.1k193
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++58478
flame/how-to-optimize-gemm
Language:C1.7k351
google/spirv-tutor
Language:Shell508
SaschaWillems/VulkanCapsViewer
Vulkan hardware capability viewer
Language:C++31266
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python40118
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python18.7k1.5k
Mozilla-Ocho/llamafile
Distribute and run LLMs with a single file.
Language:C++19k971
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Language:Python2.5k185
Syllo/nvtop
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Language:C8k291
FMInference/DejaVu
Language:Python26932
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.4k1.2k
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.8k882
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Language:C++7.9k403
lutzroeder/netron
Visualizer for neural network, deep learning and machine learning models
Language:JavaScript27.6k2.7k
edufgf/CodeOptimizationTechniques
Implementation and benchmark of optimization techniques and algorithms applied to the Matrix Multiplication problem on a CPU/GPU multithreaded environment.
Language:C++81
flame/blislab
BLISlab: A Sandbox for Optimizing GEMM
Language:C46799
ggerganov/ggml
Tensor library for machine learning
Language:C++10.9k1k
alibaba/MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Language:C++8.6k1.7k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook10.8k1.1k
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.5k504
flowtyone/flowty-realtime-lcm-canvas
A realtime sketch to image demo using LCM and the gradio library.
Language:Python1.8k149
CNugteren/CLBlast
Tuned OpenCL BLAS
Language:C++1k205
ermig1979/Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
Language:C++2k407
chenzomi12/AISystem
AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Language:Jupyter Notebook10.4k1.5k