leliyliu

my world

leliyliu's Stars

UbiquitousLearning/mllm
Fast Multimodal LLM on Mobile Devices
Language:C++53260
casys-kaist/LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Language:Python588
DefTruth/CUDA-Learn-Notes
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Language:Cuda1.5k161
zhentingqi/rStar
Language:Python51659
agiresearch/AIOS
AIOS: LLM Agent Operating System
Language:Python3.4k410
Jason-cs18/HetServe-LLMs
A Overview of Efficiently Serving Large Language Models across Edge Devices
8
intelligent-machine-learning/glake
GLake: optimizing GPU memory management and IO transmission.
Language:Python37933
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python6.1k515
onejune2018/Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.
43442
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
1k86
ggerganov/ggml
Tensor library for machine learning
Language:C++11.2k1k
KnowingNothing/compiler-and-arch
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
39635
metame-ai/awesome-llm-plaza
awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.
15312
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.8k193
sramshetty/mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Language:Python333
Efficient-ML/Awesome-Efficient-LLM-Diffusion
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
15312
casys-kaist/NeuPIMs
NeuPIMs Simulator
Language:Jupyter Notebook5413
scale-snu/attacc_simulator
Language:Python403
PSAL-POSTECH/ONNXim
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
Language:C++6711
CMU-SAFARI/ramulator2
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM standards, emerging RowHammer mitigation techniques). Described in our paper https://people.inf.ethz.ch/omutlu/pub/Ramulator2_arxiv23.pdf
Language:C++24560
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Language:Python31137
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
64524
AmadeusChan/Awesome-LLM-System-Papers
50222
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.6k132
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.5k140
MARD1NO/CUDA-PPT
7912
ptillet/torch-blocksparse
Block-sparse primitives for PyTorch
Language:Python14822
hpc-ulisboa/NDPmulator
A Full-System Framework for Simulating NDP devices from Caches to DRAM
Language:C++143
ptillet/triton
Development repository for the Triton language and compiler
4
huggingface/pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
Language:C++54535

leliyliu

leliyliu's Stars

UbiquitousLearning/mllm

casys-kaist/LLMServingSim

DefTruth/CUDA-Learn-Notes

zhentingqi/rStar

agiresearch/AIOS

Jason-cs18/HetServe-LLMs

intelligent-machine-learning/glake

sgl-project/sglang

onejune2018/Awesome-LLM-Eval

AIoT-MLSys-Lab/Efficient-LLMs-Survey

ggerganov/ggml

KnowingNothing/compiler-and-arch

metame-ai/awesome-llm-plaza

DefTruth/Awesome-LLM-Inference

sramshetty/mixture-of-depths

Efficient-ML/Awesome-Efficient-LLM-Diffusion

casys-kaist/NeuPIMs

scale-snu/attacc_simulator

PSAL-POSTECH/ONNXim

CMU-SAFARI/ramulator2

hahnyuan/LLM-Viewer

AmberLJC/LLMSys-PaperList

AmadeusChan/Awesome-LLM-System-Papers

BBuf/how-to-optim-algorithm-in-cuda

flashinfer-ai/flashinfer

MARD1NO/CUDA-PPT

ptillet/torch-blocksparse

hpc-ulisboa/NDPmulator

ptillet/triton

huggingface/pytorch_block_sparse