hnyls2002
@acm-21, RA @ucbrise, member @lm-sys @sgl-project Talk is cheap, show show way...
SJTU, UCBBerkeley
hnyls2002's Stars
cli/cli
GitHub’s official command line tool
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
guidance-ai/guidance
A guidance language for controlling large language models.
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
abetlen/llama-cpp-python
Python bindings for llama.cpp
outlines-dev/outlines
Structured Text Generation
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
mamba-org/mamba
The Fast Cross-Platform Package Manager
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
ollama-webui/ollama-webui
ChatGPT-Style Web UI Client for Ollama 🦙
lark-parser/lark
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
yhzhang0128/egos-2000
Envision a future where every student can read all the code of a teaching operating system.
rustcc/writing-an-os-in-rust
《使用Rust编写操作系统》
zjunlp/LLMAgentPapers
Must-read Papers on LLM Agents.
Niek/chatgpt-web
ChatGPT web interface using the OpenAI API
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
skyzh/write-you-a-vector-db
A Vector Database Tutorial (over CMU-DB's BusTub system)
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
lambda7xx/awesome-AI-system
paper and its code for AI System
mkuchnik/relm
ReLM is a Regular Expression engine for Language Models
yichuan520030910320/MLsys_reading_list
A record of reading list on some MLsys popular topic
wennitao/Advanced-Compiler
Advanced Compiler Assignment of ACM Class