amo33

amo33's Stars

ggerganov/llama.cpp
LLM inference in C/C++
Language:C++65.9k9.5k
YavorGIvanov/sam.cpp
Language:C++1.3k52
wzsh/wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
Language:Cuda11017
mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
Language:C++72068
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.6k177
pytorch/PiPPy
Pipeline Parallelism for PyTorch
Language:Python71786
pytorch-labs/applied-ai
Applied AI experiments and examples for PyTorch
Language:Python14212
htqin/awesome-model-quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
1.8k203
carlushuang/cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
Language:C++6413
leokruglikov/CUDA-notes
Personal notes on CUDA programming
Language:TeX51
mortennobel/cpp-cheatsheet
Modern C++ Cheatsheet
Language:C++3.1k700
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Language:Python46421
dair-ai/ML-Papers-of-the-Week
🔥Highlighting the top ML papers every week.
10k586
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
Language:Python26313
9rum/flatflow
A learned system for parallel training of neural networks
Language:C++102
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
1586
Dao-AILab/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
Language:Cuda29055
brendangregg/FlameGraph
Stack trace visualizer
Language:Perl17.2k2k
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++27730
LitLeo/OpenCUDA
Language:Cuda256112
NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
Language:C++1.2k83
te42kyfo/gpu-benches
collection of benchmarks to measure basic GPU capabilities
Language:Jupyter Notebook25138
arcee-ai/PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
Language:Python18323
SakanaAI/evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
Language:Python1.2k88
zinccat/Awesome-Triton-Kernels
Collection of kernels written in Triton language
53
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python3.1k166
CisMine/Guide-NVIDIA-Tools
NVIDIA tools guide
Language:Cuda652
jaredhoberock/stanford-cs193g-sp2010
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Language:C++19579
pentium3/sys_reading
system paper reading notes
23412
Kobzol/hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
Language:C++35028