zejia-lin
Ph.D student @sysu @arcsysu. GPU, Compiler, MLSys. φ(^∇^*) 🎶
Sun Yat-sen UniversityGuangzhou
zejia-lin's Stars
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Predidit/Kazumi
基于自定义规则的番剧采集APP,支持流媒体在线观看,支持弹幕。
google-research/vision_transformer
weishengying/tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版,具有教学意义
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
NVIDIA/TensorRT-Incubator
Experimental projects related to TensorRT
NVIDIA/cuda-python
CUDA Python Low-level Bindings
travitch/whole-program-llvm
A wrapper script to build whole-program LLVM bitcode files
MetaCubeX/mihomo
A simple Python Pydantic model for Honkai: Star Rail parsed data from the Mihomo API.
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Hannibal046/Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
EfficientLLMSys/MuxServe
microsoft/chunk-attention
pytorch/workshops
This is a repository for all workshop related materials.
j2kun/mlir-tutorial
MLIR For Beginners tutorial
Whisky-App/Whisky
A modern Wine wrapper for macOS built with SwiftUI
siyan-zhao/prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
pengsida/learning_research
本人的科研经验
zjhellofss/KuiperInfer
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
jeffreysijuntan/lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
karpathy/llm.c
LLM training in simple, raw C/CUDA
intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.